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(54) Title: CAROTENOID BIOSYNTHESIS ENZYMES 

(57) Abstract 

This invention relates to an isolated nucleic 
acid fragment encoding a carotenoid biosynthetic en- 
zyme. The invention also relates to the construction 
of a chimeric gene encoding all or a portion of the 
carotenoid biosynthetic enzyme, is sense or antisense 
orientation, wherein expression of the chimeric gene 
results in production of altered levels of the carotenoid 
biosynthetic enzyme in a transformed host cell. 
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. . ; TITLE 

CAROTENOID BIOSYNTHESIS ENZYMES 
Thisapplication claims the benefit of U.S. Provisional Application No. 60/083.042, 
filed April 24. 1998. 

FIELD OF THE INVENTION 

This invention is in the field of plant molecular biology. More specifically, this 
invention pertains to nucleic acid fragments encoding enzymes of the carotenoid 
biosynthesis pathway in plants and seeds. 

BACKGROUND OF THE INVENTION 

Plant carotenoids are orange and red lipid-soluble pigments found embedded in the 
membranes of chloroplasts and chromoplasts. In leaves and immature fruits the color is 
masked by chlorophyll but in later stages of development these pigments contribute to the t 
bright color of flowers and fruits. Carotenoids protect against photoxidation processes and 
harvest light for photosynthesis. The carotenoid biosynthesis pathway leads to the 
production of abscisic acid with intermediaries useful in the agricultural and food industries 
as well as products thought to be involved in cancer prevention (Bartley. G. E. and Scolnik, 
P. A. (1995) Plant C&lh 7: 1027-1038). . 

One of the intermediaries in the carotenoid biosynthesis pathway, lycopene. may 
have one of two different fates: through the action of lycopene epsilon cyclase it may 
become alpha carotene, or it may be transformed, into beta carotene by lycopene cyclase. 
Beta-carotene dehydroxylase converts beta-carotene into zeaxanthin. Zeaxanthin epoxidase 
transforms zeaxanthin into violxanthin and eventually absisic acid. 

Zeaxanthin is the bright orange product highly prized as a pigmenting agent for 
animal feed which makes the meat fat, skin, and egg yolks a dark yellow (Scott, M. L. et al. 
(1968) Poultry Sei: 47:863-872). Gram per gram, zeaxanthin is one of the best pigmenting 
compounds because it is highly absorbable. Yellow corn, which produces one of the best 
ratios of lutein to zeaxanthin contains in average 20 to 25 mg of xanthophyll per kg while 
marigold petals yield 6,000 to 10:000 mg/kg. 

Enzymes from the carotenoid pathway have previously been isolated from a variety 
of bacteria, fungi and higher plants. There is a great variety in the functions and properties 
of the enzymes emanating from different sources and the amino acid similarities between 
bacterial and plant enzymes is very low. Therefore, knowing the amino acid sequence of the 
enzyme from a bacteria or a plant will not necessarily make it easy to screen for the enzyme 
in another source. 

SUMMARY OF THE INVENTION 
The instant invention relates to isolated nucleic acid fragments encoding carotenoid 
biosynthetic enzymes. Specifically, this invention concerns an isolated nucleic acid 
fragment encoding a beta carotene hydroxylase, a lycopene cyclase or a lycopene epsilon 
cyclase. In addition, this invention relates to a nucleic acid fragment that is complementary 
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to the nucleic acid fragment encoding beta carotene hydroxylase, lycopene cyclase or 
lyCopene epsilon cyclase. 

An additional embodiment of the instant invention pertains to a polypeptide encoding M 
all or a substantia] portion of a carotenoid biosynthetic enzyme selected from the group 
5 consisting of beta carotene hydroxylase, lycopene cyclase and lycopene epsilon cyclase. 

In another embodiment, the instant invention relates to a chimeric gene encoding a 
beta carotene hydroxylase, a lycopene cyclase or a lycopene epsilon cyclase, or to a chimeric 
gene that comprises a nucleic acid fragment that is complementary to a nucleic acid * 
fragment encoding a beta carotene hydroxylase, a lycopene cyclase or a lycopene epsilon 
10 cyclase, operably linked to suitable regulatory sequences, wherein expression of the 

chimeric gene results in production of levels of the encoded protein in a transformed host 
cell that is altered (i.e.. increased or decreased) from the level produced in an untransformed 
host cell. 

In a further embodiment, the instant invention concerns a transformed host cell 

15 comprising in its genome a chimeric gene encoding a beta carotene hydroxylase, a lycopene 
cyclase or a lycopene epsilon cyclase, operably linked to suitable regulatory sequences. 
Expression of the chimeric gene results in production of altered levels of the encoded protein 
in the transformed host cell. The transformed host cell can be of eukaryotic or prokaryotic 
origin, and include cells derived from higher plants and microorganisms. The invention also 

20 includes transformed plants that arise from transformed host cells of higher plants, and seeds 
derived from such transformed plants. 

An additional embodiment of the instant invention concerns a method of altering the 
level of expression of a beta carotene hydroxylase, a lycopene cyclase or a lycopene epsilon 
cyclase in a transformed host cell comprising: a) transforming a host cell with a chimeric 

25 gene comprising a nucleic acid fragment encoding a beta carotene hydroxylase, a lycopene 
cyclase or a lycopene epsilon cyclase; and b) growing the transformed host cell under 
conditions that are suitable for expression of the chimeric gene wherein expression of the 
chimeric gene results in production of altered levels of beta carotene hydroxylase, lycopene 
cyclase or lycopene epsilon cyclase in the transformed host cell. 

30 An addition embodiment of the instant invention concerns a method for obtaining a 

nucleic acid fragment encoding all or a substantial portion of an amino acid sequence 
encoding a beta carotene hydroxylase, a lycopene cyclase or a lycopene epsilon cyclase. 

A further embodiment of the instant invention is a method for evaluating at least one 
compound for its ability to inhibit the activity of a beta carotene hydroxylase, a lycopene 

35 cyclase or a lycopene epsilon cyclase, the method comprising the steps of: (a) transforming 
a host cell with a chimeric gene comprising a nucleic acid fragment encoding a beta carotene 
hydroxylase, a lycopene cyclase or a lycopene epsilon cyclase, operably linked to suitable 
regulatory sequences; (b) growing the transformed host cell under conditions that are 
suitable for expression of the chimeric gene wherein expression of the chimeric gene results 
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in production of beta carotene hydroxylase, lycopene cyclase or lycopene epsilpn cyclase in 
the transformed host cell: (c) optionally purifying the beta carotene hydroxylase, lycopene 
cyclase or lycopene epsilon cyclase expressed by the transformed host cell; (d) treating the, *"* 
beta carotene hydroxylase, lycopene cyclase or lycopene epsilon cyclase with a compound to 
5 be tested; and (e) comparing the activity of the beta carotene hydroxylase, lycopene cyclase 
or lycopene epsilon cyclase that has been treated with a test compound to the activity of an 
untreated beta carotene hydroxylase, lycopene cyclase, or lycopene epsilon cyclase, thereby 
selecting compounds with potential for inhibitory activity. 

BRIEF DESCRIPTION OF THE 
10 DRAWINGS AND SEQUENCE DESCRIPTIONS 

The invention can be more fully understood from the following detailed description 
and the accompanying drawings and Sequence Listing which form a part of this application. 

figure 1 depicts the amino acid sequence alignment between the beta carotene 
hydroxylase from a corn contig assembled from clones csil .pk0019.gl and csiln.pk0035.f7 
15 (SEQ ID NO:2), a rice contig assembled from clones rl0n.pkl02.kl 9 and rls72.pk0008.e3 
(SEQ ID NO:4), soybean clone sfl 1 .pk0049.g8 (SEQ ID NO:6), wheat clone 
wkm2n.pk008.a5 (SEQ ID NO:8), and Arabidopsis thafiana (NCB1 gi Accession 
No. 1575296, SEQ IDNO:21) and Capsicum annuum (NCB1 gi Accession No. 2956717, 
SEQ ID NO:22) beta carotene hydroxylases. Amino acids which are conserved among all 
20 sequences are indicated with an asterisk (*). Dashes are used by the program to maximize 
alignment of the sequences. 

Figure 2 depicts the amino acid sequence alignment between the lycopene cylcase 
from a corn contig assembled from clones pOl 10.cgsmy23r, pOl 10.cgsmj48r and 
crln.pk0051.h6 (SEQ ID NO: 10), soybean clone sfll .pk0034.cl (SEQ ID NO: 12), wheat 
25 clone wleln.pk0059.f5 (SEQ ID NO: 14) and Arabidopsis thaliana (NCBI gi Accession 

No. 735882. SEQ ID NO:23). Amino acids which are conserved among all sequences are 
indicated with an asterisk (*). Dashes are used by the program to maximize alignment of the 
sequences. 

Figure 3 depicts the amino acid sequence alignment between the lycopene epsilon 
30 cyclase from corn clone cen3n.pk0135.fl0 (SEQ ID NO: 16), a corn contig assembled from 
clones p0126.cnldl94r and pOl 26.cnldj 12r (SEQ ID NO: 18). soybean clone srl .pk0068.bl 
(SEQ ID NO:20) and Lycopersicon esculenium (NCBI gi Accession No. 3005983, SEQ ID 
NO:24). Dashes are used by the program to maximize alignment of the sequences. 

The following sequence descriptions and Sequence Listing attached hereto comply 
35 with the rules governing nucleotide and/or amino acid sequence disclosures in patent 
applications as set forth in 37 C.F.R. §1.821-1.825'! 

SEQ ID NO:l is the nucleotide sequence comprising the contig assembled from the 
entire cDNA insert in clone csi ln.pk0035.f7 and a portion of the cDNA insert in clone 
csil .pk0019.gl encoding the C-terminal three quarters of a corn beta carotene hydroxylase. 
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SEQ ID NO:2 is the deduced amino acid sequence of the C-terminal three quarters of a 
corn beta carotene hydroxy las derived from the nucleotide sequence of SEQ ID NO:l . 

SEQ lt> NO:3 is the nucleotide sequence comprising the contig assembled from the 
entire cDNA insert in clone rls72.pk0008.e3 and a portion of the cDNA insert in clone 
5 rl0n.pkl02.kl9 encoding a portion of a rice beta carotene hydroxylase: 

SEQ ID NO:4 is the deduced amino acid sequence of a portion of a rice beta carotene 
hydroxylase derived from the nucleotide sequence of SEQ ID NO:3. 

SEQ ID NO:5 is the nucleotide sequence comprising the entire cDNA insert in clone 
sfll.pk0049.g8 encoding an entire soybean beta carotene hydroxylase. 
10 SEQ ID NQ:6 is the deduced amino acid sequence of an entire soybean beta carotene 

hydroxylase derjved from the nucleotide sequence of SEQ ID NO:5. 

SEQ ID NO:7 is the nucleotide sequence comprising a portion of the cDN A insert in- 
clone wkm2n.pk008.a5 encoding the C-terminal half of a wheat beta carotene hydroxylase. 

SEQ ID NO:8 is the deduced amino acid sequence of the C-terminal half of a wheat 
15 beta carotene hydroxylase derived from the nucleotide sequence of SEQ ID NO:9. 

SEQ ID NO:9 is the nucleotide sequence comprising the contig assmebled from the 
entire cDNA insert irv clone crln.pk0051 .h6 and a portion of the cDNA insert in clones 
pOl 1 0.cgsmy23r and pOl 1 0.cgsmj48r encoding a substantial portion of a corn lycopene 
cyclase. 

20 SEQ ID NO: 10 is the deduced amino acid sequence of a substantial portion of a corn 

lycopene cyclase derived from the nucleotide sequence of SEQ ID NO:9. 

SEQ ID NO:l 1 is the nucleotide, sequence comprising the entire cDNA insert in clone 
sfll .pk0034.cl encoding an entire soybean lycopene cyclase. 

SEQ ID NO: 12 is the deduced amino acid sequence of an entire soybean lycopene 
25 cyclase derived from the nucleotide sequence of SEQ ID NO: 1 1 . 

SEQ ID NO: 13 is the nucleotide sequence comprising the entire cDNA insert in clone 
W'leln.pk0059.f5 encoding a portion of a wheat lycopene cyclase. 

SEQ ID NO: 14 is the deduced amino acid sequence of a portion of a wheat lycopene 
cyclase derived from the nucleotide sequence of SEQ ID NO: 13. 
30 SEQ ID NO: 15 is the nucleotide sequence comprising the entire cDNA insert in clone 

cen3n.pk0135.fl0 encoding a portion of a corn lycopene epsilon cyclase. 

SEQ ID NO: 16 is the deduced amino acid sequence of a portion of a corn lycopene 
epsilon cyclase derived from the nucleotide sequence of SEQ ID NO: 15. 

SEQ ID NO: 17 is the nucleotide sequence comprising the contig assembled from a 
35 portion of the cDNA insert in clones p0126.cnldl94r and p0126.cnldjl2r encoding a portion 
of a corn lycopene epsilon cyclase. 

SEQ ID NO: 18 is the deduced amino acid sequence of a portion of a corn lycopene 
epsilon cyclase derived from the nucleotide sequence of SEQ ID NO: 17. 
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SEQ ID NO: 19 is the nucleotide sequence comprising the entire cDNA insert in clone 
srl .pk0068'.bl encoding the C-terminal two thirds of a soybean lycopene epsilon cyclase. 

SEQ ID NO:20 is the deduced amino acid sequence of the C-terminal two thirds of ^ 
soybean lycopene epsilon cyclase derived from the nucleotide sequence of SEQ ID NO: 19. 
5 SEQ ID NO:21 is the amino acid sequence of an Arabidopsis thalicma beta carotene 

hydroxylase, NCBI gi Accession No. 1575296. 

SEQ ID NO:22 is the amino acid sequence of a Capsicum ahnuum beta carotene 
hydroxylase NCBI gi Accession No. 295671 7. 

. SEQ ID NO:23 is the amino acid sequence of an Arabidopsis thaliana lycopene 
10 cyclase.NCBl gi Accession No. 735882. 

SEQ ID NO:24 is the amino acid sequence of a Lycopersicon esculemum lycopene 
epsilon cyclase. NCBI gi Accession No. 3005983. 

The Sequence Listing contains the one letter code for nucleotide sequence characters 
and the three letter codes for amino acids as defined in conformity with the 1UPAC-IUBMB 
15 standards described in Nucleic Acids Research J 3:302 1-3030 (1985) and in the Biochemical 
Journal 219 (No. 2^:345-373 (1984) which are herein incorporated by reference. The 
symbols and format used for nucleotide and amino acid sequence data comply with the rules 
set forth in 37 C.F.R. §1 .822. 

DETAILED DESCRIPTION OF THE INVENTION 
20 In the context of this disclosure, a number of terms shall be utilized. As used herein, 

an "isolated nucleic acid fragment" is a polymer of RNA or DNA that is single- or double- 
stranded, optionally containing synthetic, non-natural or altered nucleotide bases. An 
isolated nucleic acid fragment in the form of a polymer of DNA may be comprised of one or 
more segments of cDNA. genomic DNA or synthetic DNA. As used herein, "contig" refers 
25 to an assemblage of overlapping nucleic acid sequences to form one contiguous nucleotide 
sequence. For example, several DNA sequences can be compared and aligned to identify 
common or overlapping regions. The individual sequences can then be assembled into a 
single contiguous nucleotide sequence. 

As used herein, "substantially similar' refers to nucleic acid fragments wherein 
30 changes in one or more nucleotide bases results in substitution of one or more amino acids, 
but do not affect the functional properties of the protein encoded by the DNA sequence. 
"Substantially similar" also refers to nucleic acid fragments wherein changes in one or more 
nucleotide bases does not affect the ability of the nucleic acid fragment to mediate alteration 
of gene expression by antisense or co-suppression technology. "Substantially similar' also 
35 refers to modifications of the nucleic acid fragments of the instant invention such as deletion 
or insertion of one or more nucleotides that do not substantially affect the functional 
properties of the resulting transcript vis-a-vis the ability to mediate alteration of gene 
expression by antisense or co-suppression technology or alteration of the functional 
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properties of the resulting protein molecule. It is therefore understood that the invention 
encompasses more than the specific exemplary sequences. 

For example, it is well known in the art that antisense suppression and co-suppression 
of gene expression may be accomplished using nucleic acid fragments representing less than 
5 the entire coding region of a gene, and by nucleic acid fragments that do not share 100% 
sequence identity with the gene to be suppressed. Moreover, alterations in a gene which 
result in the production of a chemically equivalent amino acid at a given site, but do not 
effect the functional properties of the encoded protein, are well known in the art. Thus, a 
codon for the amino acid alanine, a hydrophobic amino acid, may be substituted by a codon 

10 encoding another less hydrophobic residue, such as glycine, or a more hydrophobic residue, 
such as valine, leucine, or isoleucine. Similarly, changes which result in substitution of one 
negatively charged residue for another, such as aspartic acid for glutamic acid, or one 
positively charged residue for another, such as lysine for arginine. can also be expected to 
produce a functionally equivalent product. Nucleotide 'changes which result in alteration of 

15 the N-terminal and C-terminal portions of the protein molecule would also not be expected 
to alter the activity of the protein. Each of the proposed modifications is well within the 
routine skill in the art, as is determination of retention of biological activity of the encoded 
products. Moreover, the skilled artisan recognizes that substantially similar nucleic acid 
sequences encompassed by this invention are also defined by their ability to hybridize, under 

20 stringent conditions (0.1X SSC, 0.1% SDS, 65°C), with the sequences exemplified herein. 
Preferred substantially similar nucleic acid fragments of the instant invention are those 
nucleic acid fragments whose DNA sequences are 80% identical to the coding sequence of 
the nucleic acid fragments reported herein. More preferred nucleic acid fragments are 90% 
identical to the coding sequence of the nucleic acid fragments reported herein. Most 

25 preferred are nucleic acid fragments that are 95% identical to the coding sequence of the 
nucleic acid fragments reported herein. Sequence alignments and percent similarity 
calculations were performed using the Megalign program of the LASARGENE 
bioinformatics computing suite (DNASTAR Inc., Madison, WI). Multiple alignment of the 
sequences was performed using the Clustal method of alignment (Higgins. D. G. and Sharp, 

30 P. M. (1989) CABIOS. 5:151-153) with the default parameters (GAP PENALTY=10, GAP 
LENGTH PENALTY=10). Default parameters for pairwise alignments using the Clustal 
method were K TUPLE 1, GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5. 

A "substantial portion" of an amino acid or nucleotide sequence comprises enough of 
the amino acid sequence of a polypeptide or the nucleotide sequence of a gene to afford 

35 putative identification of that polypeptide or gene, either by manual evaluation of the 

sequence by one skilled in the art, or by computer-automated sequence comparison and 
identification using algorithms such as BLAST (Basic Local Alignment Search Tool; 
Altschul. S. F.. et al., (1993) J. Mai. Biol. 275:403-410; see also 

www.ncbi. nlm.nih.gov/BLAST/). In general, a sequence often or more contiguous amino 
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acids or thirty or more nucleotides is necessary in-order to putative ly identify a polypeptide 
or nucleic acid sequence as homologous to a known protein or gene. Moreover, with respect 
to nucleotide sequences, gene specific oligonucleotide probes comprising 20-30 contiguous 
nucleotides may be used in sequence-dependent methods of gene identification (e.g., 
5 Southern hybridization) and isolation (e.g.. in situ hybridization of bacterial colonies or 
bacteriophage plaques). In addition, short oligonucleotides of 12-1 5 bases may be used as 
amplification primers in PCR in order to obtain a particular nucleic acid fragment 
comprising the primers. Accordingly, a ''substantial portion" of a nucleotide sequence 
comprises enough of the sequence to afford specific identification and/or isolation of a 

10 nucleic acid fragment comprising the sequence. The instant specification teaches partial or 
complete amino. acid and nucleotide sequences encoding one or more particular plant 
proteins. The skilled artisan; having the benefifof the sequences as reported herein, may ' 
now use all or a substantial portion of the disclosed sequences for purposes known to those 
skilled in this an. Accordingly, the instant invention comprises the complete sequences as 

15 reported in the accompanying Sequence Listing, as well as substantial portions of those 
sequences as defined above. 

"Codon degeneracy" refers to divergence in the genetic code permitting variation of 
the nucleotide sequence without effecting the amino acid sequence of an encoded 
polypeptide. Accordingly, the instant invention relates to any nucleic acid fragment that 

20 encodes all or a substantial portion of the amino acid sequence encoding the beta carotene 
hydroxylase, lycopene cyclase or lycopene epsilon cyclase proteins as set forth in SEQ ID 
NOs:2, 4, 6, 8, 10, 12, 14, 16, 18 and 20. The skilled artisan is well aware of the "codon- 
bias" exhibited by a specific host cell in usage of nucleotide codons to specify a given amino 
acid. Therefore, when synthesizing a gene for improved expression in a host cell, it is 

25 desirable to design the gene such that its frequency of codon usage approaches the frequency 
of preferred codon usage of the host cell. 

"Synthetic genes" can be assembled from oligonucleotide building blocks that are 
chemically synthesized using procedures known to those skilled in the art. These building 
blocks are ligated and annealed to form gene segments which are then enzymatically 

30 assembled to construct the entire gene. "Chemically synthesized", as related to a sequence 
of DNA, means that the component nucleotides were assembled in vitro. Manual chemical 
synthesis of DNA may be accomplished using well established procedures, or automated 
chemical synthesis can be performed using one of a number of commercially available 
machines. Accordingly, the genes can be tailored for optimal gene expression based on 

35 optimization of nucleotide sequence to reflect the codon bias of the host cell. The skilled 
artisan appreciates the likelihood of successful gene expression if codon usage is biased 
towards those codons favored by the host. Determination of preferred codons can be based 
on a survey of genes derived from the host cell where sequence information is available. 
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"Gene" refers to a nucleic acid fragment that expresses a specific protein, including 
regulator}' sequences preceding (5* non-coding sequences) and following (3' non-coding 
sequences) the coding sequence. "Native gene" refers to a gene as found in nature with its 
own regulatory sequences. "Chimeric gene" refers any gene that is not a native gene, 
comprising regulatory and coding sequences that are not found together in nature. 
Accordingly, a chimeric gene may comprise regulatory sequences and coding sequences that 
are derived from different sources, or regulatory sequences and coding sequences derived 
from the same source, but arranged in a manner different than that found in nature. • 
"Endogenous gene" refers to a native gene in its natural location in the genome of an 
organism. A "foreign" gene refers to a gene not normally found in the host organism, but 
that is introduced into the host organism by gene transfer. Foreign genes can comprise 
native genes inserted into a non-native organism, or chimeric genes. A "transgene" is a gene 
that has been introduced into the genome by a transformation procedure. 

"Coding sequence" refers to a DNA sequence that codes for a specific amino acid 
sequence. "Regulatory sequences** refer to nucleotide sequences located upstream (5' non- 
coding sequences), within, or downstream (3* non-coding sequences) of a coding sequence, 
and which influence the transcription, RNA processing or stability, or translation of the 
associated coding sequence. Regulatory sequences may include promoters, translation 
leader sequences, introns. and polyadenylation recognition sequences. 

"Promoter" refers to a DNA sequence capable of controlling the expression of a 
coding sequence or functional RNA. In general, a coding sequence is located 3' to a 
promoter sequence. The promoter sequence consists of proximal and more distal upstream 
elements, the latter elements often referred to as enhancers. Accordingly, an "enhancer** is a 
DNA sequence which can stimulate promoter activity and may be an innate element of the 
promoter or a heterologous element inserted to enhance the level or tissue-specificity of a 
promoter. Promoters may be derived in their entirety from a native gene, or be composed of 
different elements derived from different promoters found in nature, or even comprise 
synthetic DNA segments. It is understood by those skilled in the art that different promoters 
may direct the expression of a gene in different tissues or cell types, or at different stages of 
development, or in response to different environmental conditions. Promoters which cause a 
gene to be expressed in most cell types at most times are commonly referred to as 
"constitutive promoters". New promoters of various types useful in plant cells are 
constantly being discovered; numerous examples may be found in the compilation by 
Okamuro and Goldberg, (1989) Biochemistry of Plants 75:1-82. It is further recognized that 
since in most cases the exact boundaries of regulatory sequences have not been completely 
defined. DNA fragments of different lengths may have identical promoter activity. 

The "translation leader sequence" refers to a DNA sequence located between the 
promoter sequence of a gene and the coding sequence. The translation leader sequence is 
present in the fully processed mRNA upstream of the translation start sequence. The 
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translation leader sequence may affect processing of the primary transcript to mRNA. 

* 

mRNA stability or translation efficiency. Examples of translation leader sequences have . 
been described (Turner, R. and Foster, G. D. (1995). Molecular Biotechnology 5:225). 
The "3' non-coding sequences" refer to DNA sequences located downstream of a 
5 coding sequence and include polyadenylation recognition sequences and other sequences 

encoding regulatory signals capable of affecting mRNA processing or gene expression. The 
polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid 
tracts to the 3* end of the mRNA precursor. The use of different 3* non-coding sequences is 
exemplified by Ingelbrecht et al., (1989) Plant Cell 7:671-680. 

10 "RNA transcript" refers to the product resulting from RNA polymerase-catalyzed 

transcription of a DNA sequence. When the RNA transcript is a perfect complementary 
copy of the DNA sequence, it is referred to as the primary transcript or it may be a RNA 
sequence derived from posttranscriptional processing of the primary transcript and is 
referred to as the mature RNA. "Messenger RNA (mRNA)" refers to the RNA that is 

15 without introns and that can be translated into protein by the cell. u cDNA" refers to a 

double-stranded DNA that is complementary to and derived from mRNA. "Sense" FIN A 
refers to RNA transcript that includes the mRNA and so can be translated into protein by the 
cell. "Antisense RNA" refers to a RNA transcript that is complementary to all or part of a 
target primary transcript or mRNA and that blocks the expression of a target gene (U.S. 

20 Patent No. 5.107,065. incorporated herein by reference). The complementarity of an 

antisense RNA may be with any part of the specific gene transcript, i.e., at the 5' non-coding 
sequence. 3 1 non-coding sequence, introns, or the coding sequence. "Functional RNA" 
refers to sense RNA, antisense RNA. ribozyme RNA, or other RNA that may not be 
translated but yet has an effect on cellular processes. 

25 The term "operably linked" refers to the association of nucleic acid sequences on a 

single nucleic acid fragment so that the function of one is affected by the other. For 
example, a promoter is operably linked with a coding sequence when it is capable of 
affecting the expression of that coding sequence (i.e., that the coding sequence is under the 
transcriptional control of the promoter). Coding sequences can be operably linked to 

30 regulatory sequences in sense or antisense orientation. 

The term "expression", as used herein, refers to the transcription and stable 
accumulation of sense (mRNA) or antisense RNA derived from the nucleic acid fragment of 
the invention. Expression may also refer to translation of mRNA into a polypeptide. 
"Antisense inhibition" refers to the production of antisense RNA transcripts capable of 

35 suppressing the expression of the target protein. "Overexpression" refers to the production 
of a gene product in transgenic organisms that exceeds levels of production in normal or 
non-transformed organisms. "Co-suppression" refers to the production of sense RNA 
transcripts capable of suppressing the expression of identical or substantially similar foreign 
or endogenous genes (U.S. Patent No. 5,231 ,020, incorporated herein by reference). 
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"Altered levels" refers to the production of gene product(s) in transgenic organisms in 
amounts or proportions that differ from that of normal or non-transformed organisms. 

"Mature" protein refers to a post-translationally processed polypeptide; i.e.. one from 
which any pre- or propeptides present in the primary translation product have been removed. 
5 "Precursor" protein refers to the primary product of translation of mRNA; i.e.. with pre- and 
propeptides still present. Pre- and propeptides may be but are not limited to intracellular 
localization signals. 

A "chloroplast transit peptide" is an amino acid sequence which is translated in 
conjunction with a protein and directs the protein to the chloroplast or other plastid types 

10 present in the cell in which the protein is made. "Chloroplast transit sequence" refers to a 
nucleotide sequence that encodes a chloroplast transit peptide. A "signal peptide" is an 
amino acid sequence which is translated in conjunction with a protein and directs the protein 
to the secretory system (Chrispeels. J. J., (1991) Ann. Rev. Plant Phys. Plant MoL Biol. 
42:21 -53). If the protein is to be directed to a vacuole. . a vacuolar targeting signal (supra) 

15 can further be added, or if to the endoplasmic reticulum, an endoplasmic reticulum retention 
signal {supra) may be added. If the protein is to be directed to the nucleus, any signal 
peptide present should be removed and instead a nuclear localization signal included 
(Raikhel (1992) Plant Phys. 100:] 627-] 632). 

"Transformation" refers to the transfer of a nucleic acid fragment into the genome of a 

20 host organism, resulting in genetically stable inheritance. Host organisms containing the 

transformed nucleic acid fragments are referred to as "transgenic" organisms. Examples of 
methods of plant transformation include Agrobacterium-mediated transformation (De Blaere 
et ah (1987) Meth. Enzymol. 143\211) and particle-accelerated or "gene gun" transformation 
technology (Klein T. M. et al. (1987) Nature (London) 327:10-73: U.S. Patent 

25 No. 4,945,050. incorporated herein by reference). 

Standard recombinant DNA and molecular cloning techniques used herein are well 
known in the art and are described more fully in Sambrook, J., Fritsch, E. F. and Maniatis. 
T. Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press: Cold 
Spring Harbor, 1989 (hereinafter "Maniatis"). 

30 Nucleic acid fragments encoding at least a portion of several carotenoid biosynthetic 

enzymes have been isolated and identified by comparison of random plant cDNA sequences 
to public databases containing nucleotide and protein sequences using the BLAST 
algorithms well known to those skilled in the art. Table 1 lists the proteins that are described 
herein, and the designation of the cDNA clones that comprise the nucleic acid fragments 

35 encoding these proteins. 



10 



WO 99/55887 



PCT/US99/08384 



Carotenoid Bibsynthetic Enzymes 


Enzyme 


Clone 


Plant 


Beta Carotene Hydroxylase 


• Contig of: 

csil.pk0019.gl 
csi j n.pKUUJD.i / 


Corn 




Contig of: 


Rice 




rl0n.pkl02.kl9 






rls72.pk0008.e3 






sfN.pk0049.g8 


Soybean 




wkm2n.pk008.a5 


Wheat 


Lycopene Cyclase 

• 


Contig of: 

pOl 10.cgsmy23r 
pOl 10.cgsmj48r 
crln.pk0051 .h6 


corn 




cfii ok0034 c 1 


OVJ \ Ut all • 




Wlc J n.pKUUJ Z7.JD 


wneai . 


Lycopene Epsilon Cyclase 


cen3n.pk0135.fl0 


Corn 




Contig of: 


Corn 


> * • 


P 0126.cnldl94r 






p0126.cnldj!2r 






srl.pk0068.bl 


Soybean 



The nucleic acid fragments of the instant invention may be used to isolate cDNAs and 
5 genes encoding homologous proteins from the same or other plant species. Isolation of 

homologous genes using sequence-dependent protocols is well known in the art. Examples 
of sequence-dependent protocols include, but are not limited to, methods of nucleic acid 
hybridization, and methods of DNA and RNA amplification as exemplified by various uses 
of nucleic acid amplification technologies (e.g., polymerase chain reaction, ligase chain 
10 reaction). 

For example, genes encoding other beta carotene hydroxylase, lycopene cyclase or 
lycopene epsilon cyclase, either as cDNAs or genomic DNAs, could be isolated directly by 
using all or a portion of the instant nucleic acid fragments as DNA hybridization probes to 
screen libraries from any desired plant employing methodology well known to those skilled 

15 in the art. Specific oligonucleotide probes based upon the instant nucleic acid sequences can 
be designed and synthesized by methods known in the art (Maniatis). Moreover, the entire 
sequences can be used directly to synthesize DNA probes by methods known to the skilled 
artisan such as random primer DNA labeling, nick translation, or end-labeling techniques, or 
RNA probes using available in vitro transcription systems. In addition, specific primers can 

20 be designed and used to amplify a part or all of the instant sequences. The resulting 

amplification products can be labeled directly during amplification reactions or labeled after 
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amplification reactions, and used as probes to isolate full length cDNA or genomic 
fragments under conditions of appropriate stringency. 

In addition, two short segments of the instant nucleic acid fragments may be used in 
polymerase chain reaction protocols to amplify longer nucleic acid fragments encoding ♦ 
5 homologous genes from DNA or RNA. The polymerase chain reaction may also be 

performed on a library of cloned nucleic acid fragments wherein the sequence of one primer 
is derived from the instant nucleic acid fragments, and the sequence of the other primer takes • 
advantage of the presence of the polyadenylic acid tracts to the 3' end of the mRNA^ 
precursor encoding plant genes. Alternatively, the second primer sequence may be based 

10 upon sequences derived from the cloning vector. For example, the skilled artisan can follow ■ 
the* RACE protocol (Frohman et al., (1988) Proc. Natl Acad. Sci. USA 5.5:8998) to generate . 
cDNAs by using PCR to amplify copies of the region between a single point in the 
transcript and the 3 1 or 5' end. Primers oriented in the 3' and 5* directions can be designed 
from the instant sequences. Using commercially available 3' RACE or 5' RACE systems 

15 (BRL) ? specific 3' or 5 1 cDNA fragments can be isolated (Ohara et al., (1989) Proc. Natl. 
Acad. Sci. USA 56:5673; Loh et al. 5 (1989) Science 243:211). Products generated by the 3' 
and 5' RACE procedures can be combined to generate full-length cDNAs (Frohman. M. A. 
and Martin, G. R.. (1989) Techniques 7:165). 

Availability of the instant nucleotide and deduced amino acid sequences facilitates 

20 immunological screening of cDNA expression libraries. Synthetic peptides representing 
portions of the instant amino acid sequences may be synthesized. These peptides can be 
used to immunize animals to produce polyclonal or monoclonal antibodies with specificity 
for peptides or proteins comprising the amino acid sequences. These antibodies can be then , 
be used to screen cDNA expression libraries to isolate full-length cDNA clones of interest 

25 (Lerner, R. A. ( 1 984) Adv. Immunol. 36: 1 : Maniatis). 

The nucleic acid fragments of the instant invention may be used to create transgenic 
plants in which the disclosed beta carotene hydroxylase, lycopene cyclase or lycopene 
epsilon cyclase are present at higher or lower levels than normal or in cell types or 
developmental stages in which they are not normally found. This would have the effect of 

30 altering the level of alpha carotene or beta carotene in those cells. 

Co-suppression of the lycopene epsilon cyclase in corn endosperm should divert more 
of the xanthophill biosynthesis towards zeaxanthin. Zeaxanthin gives a consumer-preferred 
darker color to yolks and poultry products. The xanthophyll present in maize is well utilized 
by poultry but is present in very small amounts (25 to 50 mg/Kg compared to 6.000 to 

35 10.000 in marigolds). Thus, increasing the amount of zeaxanthin in corn would produce a 
more desirable product for poultry feed. Hydroxylases catalyze the last step in xanthophyll 
biosynthesis. Increasing hydroxylase activity in corn endosperm may increase xanthophyll 
content. Blocking hydroxylase activity may create a high bata-carotene corn which may be 
valuable for human consumption as well as animal feed. 
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Overexpression of the beta carotene, hydroxylase, lycopene cyclase or lycopene 
epsilon cyclase proteins of the instant invention may be accomplished by first constructing a 
chimeric gene in which the coding region is operably linked to a promoter capable of 
directing expression of a gene in the desired tissues at the desired stage of development. For 
reasons of convenience, the chimeric gene may comprise promoter sequences and translation 
leader sequences derived from the same genes. 3 f Non-coding sequences encoding 
transcription termination signals may also be provided. The instant chimeric gene may also 
comprise one or more introns in order to facilitate gene expression. 

Plasmid vectors comprising the instant chimeric gene can then constructed. The 
choice of plasmid vector is dependent upon the method that will be used to transform host 
plants. The skilfed artisan is well aware of the genetic elements that must be present on the 
plasmid vector in order to successfully transform, select and propagate host cells containing 
the chimeric gene. The skilled artisan will also recognize that different independent 
transformation events will result in different levels and patterns of expression (Jones et al., 
(1985) EM BO J. 4:241,1-2418: De Almeida et al.,'(1989) Mol. Gen. Genetics 275:78-86). 
and thus that multiple events must be screened in order to obtain lines displaying the desired 
expression level and.pattern. Such screening may be accomplished by Southern analysis of 
DNA, Northern analysis of mRNA expression. Western analysis of protein expression, or 
phenotypic analysis. 

For some applications it may be useful to direct the instant carotenoid biosynthetic 
enzyme to different cellular compartments, or to facilitate its secretion from the cell. It is 
thus envisioned that the chimeric gene described above may be further supplemented by 
altering the coding sequence to encode beta carotene hydroxylase with appropriate 
intracellular targeting sequences such as transit sequences (Keegstra. K. (1989) Cell 
56:247-253). signal sequences or sequences encoding endoplasmic reticulum localization 
(Chrispeels, J. J.,(1991),4wr Rev. Plant Phys. Plant Mol. Biol. 42:21-53), or nuclear 
localization signals (Raikhel. N. (1992) Plant PhysJ00:\621A632) added and/or .with 
targeting sequences that are already present removed. While the references cited give 
examples of each of these, the list is not exhaustive and more targeting signals of utility may 
be discovered in the future. 

It may also be desirable to reduce or eliminate expression of genes encoding beta 
carotene hydroxylase, lycopene cyclase or lycopene epsilon cyclase in plants for some 
applications. In order to accomplish this, a chimeric gene designed for co-suppression of the 
instant carotenoid biosynthetic enzyme can be constructed by linking a gene or gene 
fragment encoding a beta carotene hydroxylase, lycopene cyclase or lycopene epsilon 
cyclase to plant promoter sequences. Alternatively, a chimeric gene designed to express 
antisense RNA for all or part of the instant nucleic acid fragment can be constructed by 
linking the gene or gene fragment in reverse orientation to plant promoter sequences. Either 
the co-suppression or antisense chimeric genes could be introduced into plants via 
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transformation wherein expression of the corresponding endogenous genes are reduced or 
eliminated. 



(or portions thereof) may be produced in heterologous host cells, particularly in the cells o£ 
microbial hosts, and can be used to prepare antibodies to the these proteins by methods well 
known to those skilled in the an. The antibodies are useful for detecting beta carotene 
hydroxylase, lycopene cyclase or lycopene epsilon cyclase in situ in cells or in vitro in cell 
extracts. Preferred heterologous host cells for production of the instant beta carotene 
hydroxylase, lycopene cyclase or lycopene epsilon cyclase are microbial hosts. Microbial 
expression systems and expression vectors containing regulatory sequences that direct high 
level expression of foreign proteins are well known to those skilled in the art. Any of these 
could be used to construct a chimeric gene for production of the instant beta carotene 
hydroxylase, lycopene cyclase or lycopene epsilon cyclase. This chimeric gene could then 
be introduced into appropriate microorganisms via transformation to provide high level 
expression of the encoded carotenoid biosynthetic enzyme. An example of a vector for high 
level expression of the instant beta carotene hydroxylase, lycopene cyclase or lycopene 
epsilon cyclase in a bacterial host is provided (Example 8). 

Additionally, the instant beta carotene hydroxylase, lycopene cyclase or lycopene 
epsilon cyclase can be used as targets to facilitate design and/or identification of inhibitors 
of those enzymes that may be useful as herbicides. This is desirable because the beta 
carotene hydroxylase, lycopene cyclase and lycopene epsilon cyclase described herein 
catalyze various steps in carotenoid biosynthesis. Accordingly, inhibition of the activity of 
one or more of the enzymes described herein could lead to inhibition plant growth. Thus, 
the instant beta carotene hydroxylase, lycopene cyclase or lycopene epsilon cyclase could be 
appropriate for new herbicide discovery and design. 

All or a substantial portion of the nucleic acid fragments of the instant invention may 
also be used as probes for genetically and physically mapping the genes that they are a part 
of, and as markers for traits linked to those genes. Such information may be useful in plant 
breeding in order to develop lines with desired phenotypes. For example, the instant nucleic 
acid fragments may be used as restriction fragment length polymorphism (RFLP) markers. 
Southern blots (Maniatis) of restriction-digested plant genomic DNA may be probed with 
the nucleic acid fragments of the instant invention. The resulting banding patterns may then 
be subjected to genetic analyses using computer programs such as MapMaker (Lander et at., 
(1987) Genomics 7:174-181) in order to construct a genetic map. In addition, the nucleic 
acid fragments of the instant invention may be used to probe Southern blots containing 
restriction endonuclease-treated genomic DNAs of a set of individuals representing parent 
and progeny of a defined genetic cross. Segregation of the DNA polymorphisms is noted 
and used to calculate the position of the instant nucleic acid sequence in the genetic map 



The instant beta carotene hydroxylase, lycopene cyclase or lycopene epsilon cyclase 
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previously obtained using this population (Botstein. D. et al.. (1980)^/?7. J. Hum. Genei. 
52:314031). 

The production and use of plant gene-derived probes for use in genetic mapping is 
described in R. Bernatzky\ R. and Tanksley, S. D. (1986) Plant Mol. Biol. Reporter 1 
5 4(J):37-4] . Numerous publications describe genetic mapping of specific cDNA clones 
using the methodology outlined above or variations thereof. For example. F2 intercross 
populations, backcross populations, randomly mated populations, near isogenic lines, and 
other sets of individuals may be used for mapping. Such methodologies are well known to . 
those skilled in the art. 

10 Nucleic acid probes derived from the instant nucleic acid sequences may also be used 

for physical mapping (i.e., placement of sequences on physical maps; see Hoheisel. J. D., et 
al., In: Nonmammalian Genomic Analysis: A Practical Guide, Academic press 1996, 
pp. 3 1 9-346, and references cited therein). 

In another embodiment, nucleic acid probes derived from the instant nucleic acid 

15 sequences may be used in direct fluorescence in situ hybridization (FISH) mapping (Trask, 
B. J. (1991) Trends Genet. 7:149-154). Although current methods of FISH mapping favor 
use of large clones (several to several hundred KB; see Laan, M. et al. (1995) Genome 
Research 5:13-20), improvements in sensitivity may allow performance of FISH mapping 
using shorter probes. 

20 A variety of nucleic acid amplification-based methods of genetic and physical 

mapping may be carried out using the instant nucleic acid sequences. Examples include 
allele-specific amplification (Kazazian, H. H. (1 989) J. Lab. Clin. Med. 1 14(2):95-96), 
polymorphism of PCR-ampIified fragments (CAPS: Sheffield, V. C. et al. (1993) Genomics 
76:325-332), allele-specific ligation (Landegren. U. et al. (1988) Science 2^7:1077-1080). 

25 nucleotide extension reactions (Sokolov. B. P. (1990) Nucleic Acid Res. 75:3671). Radiation 
Hybrid Mapping (Walter. M. A. et al. (1997) Nature Genetics 7:22-28) and Happy Mapping 
(Dear. P. H. and Cook, P. R. ( 1 989) Nucleic Acid Res. J 7:6795-6807). For these methods, 
the sequence of a nucleic acid fragment is used to design and produce primer pairs for use in 
the amplification reaction or in primer extension reactions. The design of such primers is 

30 well known to those skilled in the art. In methods employing PCR-based genetic mapping, 
it may be necessary to identify DNA sequence differences between the parents of the 
mapping cross in the region corresponding to the instant nucleic acid sequence. This, 
however, is generally not necessary for mapping methods. 

Loss of function mutant phenotypes may be identified for the instant cDNA clones 

35 either by targeted gene disruption protocols or by identifying specific mutants for these 

genes contained in a maize population carrying mutations in all possible genes (Ballinger 
and Benzer, (1989) Proc. Natl Acad. Sci USA 56:9402; Koes et al., (1995) Proc. Natl. Acad. 
Sci USA 92:8149; Bensen et al., (1995) Plant Cell 7:75). The latter approach may be 
accomplished in two ways. First, short segments of the instant nucleic acid fragments may 
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be used in polymerase chain reaction protocols in conjunction with a mutation tag sequence 
primer on DNAs prepared from a population of plants in which Mutator transposons or some 
other mutation-causing DNA element has been introduced (see Bensen. supra). The 
amplification of a specific DNA fragment with these primers indicates the insertion of the 
5 mutation tag element in or near the plant gene encoding the beta carotene hydroxylase, 
lycopene cyclase or lycopene epsilon cyclase. Alternatively, the instant nucleic acid 
fragment may be used as a hybridization probe against PCR amplification products 
generated from the mutation population using the mutation tag sequence primer in 
conjunction with an arbitrary genomic site primer, such as that for a restriction enzyme site- 

10 anchored synthetic adaptor. With either method, a plant containing a mutation in the 
endogenous gene encoding a beta carotene hydroxylase, lycopene cyclase or lycopene 
epsilon cyclase can be identified and obtained. This mutant plant can then be used to 
determine or confirm the natural function of the beta carotene hydroxylase, lycopene cyclase 
or lycopene epsilon cyclase gene product. 

15 EXAMPLES 

The present invention is further defined in the following Examples, in which all parts 
and percentages are by weight and degrees are Celsius, unless otherwise stated. It should be 
understood that these Examples, while indicating preferred embodiments of the invention, 
are given by way of illustration only. From the above discussion and these Examples, one 

20 skilled in the art can ascertain the essential characteristics of this invention, and without 

departing from the spirit and scope thereof, can make various changes and modifications of 
the invention to adapt it to various usages and conditions. 

EXAMPLE 1 

Composition of cDNA Libraries: Isolation and Sequencing of cDNA Clones 
25 cDNA libraries representing mRNAs from various corn. rice, soybean and wheat 

tissues were prepared. The characteristics of the libraries are described below. 

TABLE 2 

cDNA Libraries from Corn. Rice. Soybean and Wheat 

Library Tissue Clone 

cen3n Corn Endosperm 20 Days After Pollination* cen3n.pk01 35. f 1 0 

crln Corn Root From 7 Day Old Seedlings* crln.pkOOSl .h6 

csil Corn Silk csil.pk0019.gl 

csiln Corn Silk* csi In.pk0035.f7 

pOl 10 Corn Leaf Tissue (minus midrib). Stages V3/V4 Infiltrated p01 1 0.cgsmj48r 

With Salicylic Acid for 4 Hours, 24 Hours and 7 Days, q j j q C p Smv 23 r 
Pooled* 

p0126 Corn Leaf Tissue From V8-V10 Stages, Pooled. pOl 26.cnldj 1 2r 

Night-Harvested pOl 26.cnldl94r 
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Library 


Tissue 


Clone 


rlOn 


Rice 15 Day Old Leaf* 


rl0n.pkl02.kl9 


rls72 


Rice Leaf 15 Days After Germination. 72 Hours After 
Infection of Strain Magaponhe grisea 4360-R-67 
(AVR2-YAMO); Susceptible 


rls72.pk00O8.e3 


sfll 


Soybean Immature Flower 


sfll.pk0034.cl 
sfll.pk0049.g8 


srl 


Soybean Rool 


srl.pk0068.bl 


wkm2n 


Wheat Kernel Malted 175 Hours at 4 Degrees Celsius* 


wkm2n.pk008.a5 


wleln 


Wheat Leaf From 7 Day Old Etiolated Seedling* 


wleln.pk0059.f5 



These libraries were normalized essentially as described in U.S. Patent No. 5,482.845 

cDNA libraries were prepared in Uni-ZAP™ >yR vectors according to the 
5 manufacturer's protocol (Stratagene Cloning Systems, La Jolla, CA). Conversion of the ' 
Uni-ZAP™ XR libraries into plasmid libraries was accomplished according to the protocol 
provided by Stratagene. Upon conversion, cDNA inserts were contained in the plasmid 
vector pBluescfipt. cDNA inserts from randomly picked bacterial colonies containing 
recombinant pBluescript plasmids were amplified via polymerase chain reaction using 
10 primers specific for vector sequences flanking the inserted cDNA sequences or plasmid 

DNA was prepared from cultured bacterial cells! Amplified insert DNAs or plasmid DNAs 
were sequenced in dye-primer sequencing reactions to generate partial cDNA sequences 
(expressed sequence tags or "ESTs"; see Adams, M. D. et al.. (1991) Science 252:1651). 
The resulting ESTs were analyzed using a Perkin Elmer Model 377 fluorescent sequencer. 

15 EXAMPLE 2 

•I* • " 

Identification of cDNA Clones 
ESTs encoding carotenoid biosynthetic enzymes were identified by conducting 
BLAST (Basic Local Alignment Search Tool; Altscbul, S. F.. et al., (1993) J. Mol Biol. 
275:403-410; see also www. ncbi.nlm.nih.gov/BLAST/) searches for similarity to sequences 

20 contained in the BLAST "nr" database (comprising all non-redundant GenBank CDS 

translations, sequences derived from the 3-dimensional structure Brookhaven Protein Data 
Bank, the last major release of the SW1SS-PROT protein sequence database, EMBL, and 
DDBJ databases). The cDNA sequences obtained in Example 1 were analyzed for similarity 
to all publicly available DNA sequences contained in the "nr" database using the BLASTN 

25 algorithm provided by the National Center for Biotechnology Information (NCBI). The 
DNA sequences were translated in all reading frames and compared for similarity to all 
publicly available protein sequences contained in the "nr" database using the BLASTX 
algorithm (Gish, W. and States, D. J. (1 993) Nature Genetics 5:266-272) provided by the 
NCBI. For convenience, the P-value (probability) of observing a match of a cDNA 
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sequence to a sequence contained in the searched databases merely by chance as calculated 
by BLAST- are reported herein as "pLog ?r values, which represent the negative of the 
logarithm of the reported P-value. Accordingly, the greater the pLog value, the greater the 
likelihood that the cDNA sequence and the BLAST "hit" represent homologous proteins. * 

EXAMPLE 3 

Characterization of cDNA Clones Encoding Beta Carotene Hydroxylase 
The BLASTX search using the EST sequences from clones csil.pk0019.gl, 
csiln.pk0035.f7. rls72.pk0008.e3, and a contig formed by the sequences from clones 
ssm.pk0030.d2. sfll .pk0049.g8 ; se2.12e09. ssm.pk0023.b3, and ssm.pk0030.d2 revealed 
similarity of the proteins encoded by the cDNAs to Beta Carotene Hydroxylase from 
Arabidopsis thaliana (GenBank Accession No. U58919). The BLAST results for each of 
these ESTs and the contig are shown in Table 3: 

■ TABLE 3 

BLAST Results for Clones Encoding Polypeptides Homologous 
to Beta Carotene Hydroxylase 



BLAST pLog Score 
Clone U58919 



csil.pk0019.gl 


64.26 


csiln.pk0035.f7 


27.10 


rls72.pk0008.e3 


7.53 


Contig formed of 


27.39 


ssm.pk0030.d2 




sfll.pk0049.g8 




se2.12e09 




ssm.pk0023.b3 




ssm.pk0030.d2 





TBLASTN analysis of the proprietary plant EST database indicated that a wheat clone 
also encoded beta carotene hydroxylase. The BLASTX search using the EST sequences 
from clone wkm2n.pk008.a5 revealed similarity of the proteins encoded by the cDNAs to 
beta carotene hydroxylase 2 from Capsicum annuum (NCB1 General Identifier 
No. 29567 17), with a pLog value of 72.22. 

The sequence of the entire cDNA insert in clone csiln.pk0035.f7 was determined and 
a contig assembled with this sequence and the EST sequence from clone csil .pk0019.gl . 
The sequence from this contig is shown in SEQ ID NO:l; the deduced amino acid sequence 
of this cDNA is shown in SEQ ID NO:2. The sequence of the entire cDNA insert in clone 
rls72.pk0008.e3 was determined and a contig assembled with this sequence and the EST 
sequence from clone rl0n.pkl02.kl9. The sequence from this contig is shown in SEQ ID 
NO:3; the deduced amino acid sequence of this cDNA is shown in SEQ ID NO:4. The 
sequence of the entire cDNA insert in clone sfll .pk0049.g8 was determined and is shown in 
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SEQ ID NO:5; ihe deduced amino acid sequence of this cDNA is shown in SEQ ID NO:6. 
The amino -sicid sequence set forth in SEQ ID NO:6 was evaluated by BLASTP, yielding a 
pLog value of 122 versus the Capsicum annuum sequence (TslCBl General Identifier 
No. 2956717; SEQ ]DNO:24). The sequence of a portion of the cDNA insert from clone 
5 wkm2n.pk008.a5 is shown in SEQ ID NO:7; the deduced amino acid sequence of this cDNA 
is shown in SEQ ID NO:8. Figure 1 presents an alignment of the amino acid sequences set 
forth in SEQ ID NOs:2. 4, 6 and 8 with the Arabidopsis thahana and the Capsicum annuum 
sequences (SEQ ID NO:21 and SEQ ID NO:22). The data in Table 4 presents a calculation 
of the. percent identity of the amino acid sequences set forth in SEQ ID NOs:2 ? 4, 6 and 8 
10 and the. Capsicum annuum sequence. 

TABLE 4 

Percent Identity of Amino Acid Sequences Deduced From the Nucleotide Sequences of 
cDNA Clones Encoding Polypeptides Homologous to Beta Carotene Hydroxylase 



Percent Identitv to 
Clone SEQ ID NO. 2956717 " 



Contig of: 

csil.pk0019.gl 
csiln.pkO035.f7 


2 


67.9 


Contig of: 

rl0n.pkl02.kl 9 
rls72.pk0008.e3 


4 


64.1 


sfll.pk0049.g8 


6 


66.3 


wkm2n.pk008.a5 


8 


78.6 



Sequence alignments and percent similarity calculations were performed using the 
Megalign program of the LASARGENE bioinformatics computing suite (DNASTAR Inc., 
Madison, WI). Multiple alignment of the amino acid sequences was performed using the 
Clustal method of alignment (Higgins. D. G. and Sharp, P. M. (1989) CABJOS. 5:151-153) 
20 with the default parameters (GAP PENALTY=1 0, GAP LENGTH PENALTY=10). 

Sequence alignments and BLAST scores and probabilities indicate that the instant 
nucleic acid fragments encode entire soybean beta carotene hydroxylase and portions of 
corn, rice and wheat beta carotene hydroxylase. These sequences represent the first monocot 
and the first soybean sequences encoding beta carotene hydroxylase. 
25 EXAMPLE 4 

Characterization of cDNA Clones Encoding Lvcopene Cyclase 
The BLASTX search using the EST sequences from clones crln.pk0051 .h6 and 
wreln.pk0146.g4 revealed similarity of the proteins encoded by the cDNAs to Lvcopene 
Cyclase from Arabidopsis thahana (GenBank Accession No. L401 76). The BLAST results 
30 for each of these ESTs are shown in Table 5: 
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TABLE 5 ' 

BLAST Results for Clones Encoding Polypeptides Homologous to Lycopene Cyclase 





BLAST pLog Score 


Clone 


GenBank Accession No. L40176 


crln.pk0051.h6 


19.52 


wreln.pk0146.g4 


28.00 



TBLASTN analysis of the proprietary plant EST database indicated that a soybean 
5 clone also encoded lycopene cyclase. The BLASTX search using the EST sequence from 
clone sflLpk0034.cl revealed similarity of tl?e proteins encoded by the cDNAs to 
capsanthin-capsorubin synthase from Capsicum annuum (NCB1 General Identifier 
No. 626009), with a pLog value of >254. The Capsicum annuum capsanthin-capsorubin 
synthase is a chromoplast lycopene cyclase (Hugueney. P. et al (1995) Plant J. 5:417-424).. 

10 The sequence of the entire cDNA insert in clone crln.pk0051 .h6 was determined and a 

contig assembled with this sequence and the EST sequences from clones pOl 10.cgsmy23r 
and pOl 1 0.cgsmj48r. This contig sequence is shown in SEQ ID NO:9; the deduced amino 
acid sequence of this cDNA is shown in SEQ ID NO: 10. The amino acid sequence set forth 
in SEQ ID NO: 1,0 was evaluated by BLASTP, yielding a pLog value of 153 versus the 

15 Arabidopsis thaliana sequence (NCB1 General Identifier No. 735882; SEQ ID NO:23). The 
sequence of the entire cDNA insert in clone sfll .pk0034.cl was determined and is shown in 
SEQ ID NO:l 1: the deduced amino acid sequence of this cDNA is shown in SEQ ID NO: 12. 
The amino acid sequence set forth in SEQ ID NO: 12 was evaluated by BLASTP, yielding a 
pLog value of >254 versus the Capsicum annuum sequence (NCB1 General Identifier 

20 No. 626009). The sequence of the entire cDNA insert in clone wlel n.pk0059.f5 was 

determined and is shown in SEQ ID NO: 13; the deduced amino acid sequence of this cDNA 
is shown in SEQ ID NO: 14. The amino acid sequence set forth in SEQ ID NO: 14 was 
evaluated by BLASTP. yielding a pLog value of 82.52 versus the Arabidopsis thaliana 
sequence (NCB1 General Identifier No. 735882). Figure 2 presents an alignment of the 

25 amino acid sequences set forth in SEQ ID NOs:10, 12 and 14 and the Arabidopsis thaliana 
sequence (SEQ ID NO:23). The data in Table 6 presents a calculation of the percent identity 
of the amino acid sequences set forth in SEQ ID NOs:10. 12 and 14 and the Arabidopsis 
thaliana sequence. 



20 



WO 99/5f>887 



TABLE 6 



PCT/US99/08384 



.Percent Identity of Amino Acid Sequences Deduced From the Nucleotide Sequences of 
cDNA Clones Encoding Polypeptides Homologous to Lycopene Cyclase ._ 



Clone 



SEQ ID NO 



Percent Identity to 
735882 



10 



15 



20 



Contig of: 


10 


72.3 


pOl 10.cgsmy23r 






pOl 10.cgsmj48r 






crln.pk0051.h6 






sfil.pk0034.cl 


12 


50.4 


wleln.pk0059.f5 


14 


67.6 



Sequence alignments and percent similarity calculations were performed using the 
Megalign program of the LASARGENE bioinformatics computing suite (DNASTAR lnc. ? 
Madison, Wl). Multiple alignment of the amino acid sequences was performed using the 
Clustal method of alignment (Higgins, D. G. and Sharp, P. M. (1989) CADIOS. 5:151-153) . 
with the default parameters (GAP PENALTY=10 r GAP LENGTH PENALTY=10). 

Sequence alignments and BLAST scores and probabilities indicate that the instant 
nucleic acid fragments encode entire soybean lycopene cylcase and portions of corn and 
wheat lycopene cyclase. These sequences represent the first monocot and the first soybean 
sequences encoding lycopene cyclase. 

EXAMPLE 5 

Characterization of cDNA Clones Encoding Lvcopene Epsilon Cyclase 
The BLASTX search using the nucleotide sequence from clone cen3n.pk0135.flO, and 
the EST sequence from clone srl .pk0068.bl revealed similarity of the proteins encoded by 
the cDNAs to Lycopene Epsilon Cyclase from Arabidopsis thaliana (GenBank Accession 
No. U50738). The BLAST results for each of these ESTs are shown in Table 7: 

TABLE 7 

BLAST Results for Clones Encoding Polypeptides 
Homologous to Lycopene Epsilon Cyclase 





BLAST pLog Score 


Clone 


GenBank Accession No. U5073 8 


cen3n.pk0135.fl0 


37.70 


srl .pk0068.bl 


31.40 



25 TBLASTN analysis of the proprietary plant EST database indicated that other corn 

clones besides cen3n.pk0135.fl0 also encoded lycopene epsilon cyclase. The BLASTX 
search using the contig assembled of the EST sequences from clones p0126xnldl94r and 
p0126.cnldjl2rrevealed similarity of the proteins encoded by the cDNAs to lycopene epsilon 
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cyclase from Lycopersicon esculentum (NCB1 General Identifier no. 3005983). with a pLog 
value of 75.00. * 

The nucleotide sequence of the entire cDNA insert in clone cen3n.pk01 35.fl 0 was 
determined and is shown in SEQ ID NO: 15; the deduced amino acid sequence of this cDNA 
is shown in SEQ ID NO: 16. The amino acid sequence set forth in SEQ ID NO: 16 was 
evaluated by BLASTP, yielding a pLog value of 40.70 versus the Lycopersicon esculentum 
sequence. The nucleotide sequence of the contig assembled from a portion of the cDNA 
insert in clones p0126.cnldl94r and p0126.cnldj!2r is shown in SEQ ID NO: 17; the deduced 
amino acid sequence of this cDNA is shown in SEQ ID NO: 17. The sequence of the entire 
cDNA insert in clone srl .pk0068.bl was determined and is shown in SEQ ID NO: 19; the 
deduced amino acid sequence of this cDNA is shown in SEQ ID NO:20. The amino acid 
sequence set forth in SEQ ID NO:20 was evaluated by BLASTP. yielding a pLog value of 
127.0 vresus the Lycopersicon esculentum sequence.- 

Figure 3 presents an alignment of the amino acid sequences set forth in SEQ ID 
NOs:16. 1 8 and 20 and the Lycopersicon esculentum sequence (SEQ ID NO:24). The data 
in Table 8 presents a calculation of the percent identity of the amino acid sequences set forth 
in SEQ ID NOs:l 6, 3 8 and 20 and the Lycopersicon esculentum sequence. 

• r 

< i 

TABLE 8 

Percent Identity of Amino Acid Sequences Deduced From the Nucleotide Sequences 
of cDNA Clones Encoding Polypeptides Homologous 







Percenl Identity to 


Clone 


SEQ ID "NO. 


3005983 ' 


cen3n.pk0135.fl0 


16 


71.1 


Contig of: 

p0126.cnldl'94r 
p0126.cnldjl2r 


18 


63.2 


srl.pk0068.bl 


20 


68.6 



Sequence alignments and percent similarity calculations were performed using the 
Megalign program of the LASARGENE bioinformatics computing suite (DNASTAR Inc.. 
Madison. Wl). Multiple alignment of the amino acid sequences was performed using the 
Clustal method of alignment (Higgins, D. G. and Sharp, P. M. (1989) CABJOS. 5:151-153) 
with the default parameters (GAP PENALTY=10, GAP LENGTH PEN ALT Y= 10). 

Sequence alignments and BLAST scores and probabilities indicate that the instant 
nucleic acid fragments encode portions of corn lycopene epsilon cyclase and a nearly entire 
soybean lycopene epsilon cyclase. These sequences represent the first corn and soybean 
sequences encoding lycopene epsilon cyclase. 
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EX ANdPLE 6 
Expression of Chimeric Genes in Monocot Cells 
A chimeric gene comprising a cDNA encoding a carotenoid biosynthetic enzyme in 
sense orientation with respect to the maize 27 kD zein promoter that is located 5' to the 

5 cDNA fragment, and the 10 kD zein 3' end that is located 3' to the cDNA fragment, can be 
constructed. The cDNA fragment of this gene may be generated by polymerase chain 
reaction (PCR) of the cDNA clone using appropriate oligonucleotide primers. Cloning sites 
(Nco I or Sma I) can be incorporated into the oligonucleotides to provide proper orientation 
of the DNA fragment when inserted into the digested vector pMLl 03 as described below. 

10 Amplification is then performed in a standard PCR. The amplified DNA is then digested 
with restriction efizymes Nco 1 and Smal and fractionated on an agarose gel. The 
appropriate band can be isolated from the gel and combined with a 4.9 kb Nco I-Sma I 
fragment of the plasmid pML103. Plasmid pML103 has been deposited under the terms of 
the Budapest Treaty at ATCC (American Type Culture Collection. 10801 University Blvd., 

15 Manassas, VA 201 10-2209). and bears accession number ATCC 97366. The DNA -segment 
from pML103 contains a 1 .05 kb Sal I-Nco ] promoter fragment of the maize 27 kD zein 
gene and a 0.96 kb Sjna 1-Sal 1 fragment from the 3' end of the maize 10 kD zein gene in the 
vector pGem9Zf(+) (Promega). Vector and insert DNA can be ligated at 1 5°C overnight, 
essentially as described (Maniatis). The ligated DNA may then be used to transform £. coli 

20 XL! -Blue (Epicurian Coli XL-1 Blue™; Stratagene). Bacterial transformants can be 

screened by restriction enzyme digestion of plasmid DNA and limited nucleotide sequence 
analysis using the dideoxy chain termination method (Sequenase™ DNA Sequencing Kit; 
U.S. Biochemical). The resulting plasmid construct would comprise a chimeric gene 
encoding, in the 5' to 3' direction, the maize 27 kD zein promoter, a cDNA fragment 

25 encoding a caroterjoid biosynthetic enzyme, and the 10 kD zein 3' region. 

The chimeric gene described above can then be introduced into corn cells by the 
following procedure. Immature corn embryos can be dissected from developing caryopses 
derived from crosses of the inbred corn lines H99 and LH132. The embryos are isolated 10 
to 1 1 days after pollination when they are 1.0 to ] .5 mm long. The embryos are then placed 

30 with the axis-side facing down and in contact with agarose-solidified N6 medium (Chu 
et al., (1975) Scl Sin. Peking 1 8:659-668). The embryos are kept in the dark at 27°C. 
Friable embryogenic callus consisting of undifferentiated masses of cells with somatic 
proembryoids and embryoids borne on suspensor structures proliferates from the scutellum 
of these immature embryos. The embryogenic callus isolated from the primary explant can 

35 be cultured on N6 medium and sub-cultured on this medium every 2 to 3 weeks. 

The plasmid, p35S/Ac (obtained. from Dr. Peter Eckes, Hoechst Ag, Frankfurt, 
Germany) may be used in transformation experiments in order to provide for a selectable 
marker. This plasmid contains the Pat gene (see European Patent Publication 0 242 236) 
which encodes phosphinothricin acetyl transferase (PAT). The enzyme PAT confers 
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resistance to herbicidal glutamine synthetase inhibitors such as plWphinothricin. The pa/ 
gene in p35S/Ac is under the control of the 35S promoter from Cauliflower Mosaic Virus 
(Odell et al.' (1985) Nature 313:810-812) and the 3' region of the nopaline synthase gene . 
from the T-DNA of the Ti plasmid of Agrobaaerium tumefaciens. 
5 The particle bombardment method (Klein T. M. et ah, (1987) Nature 327:70-73) may 

be used to transfer genes to the callus culture cells. According to this method, gold particles 
(1 fim in diameter) are coated with DNA using the following technique. Ten ng of plasmid 
DNAs are added to 50 |aL of a suspension of gold particles (60 mg per mL). Calcium 
chloride (50 of a 2.5 M solution) and spermidine free base (20 \\L of a 1 .0 M solution) 

10 are added to the particles. The suspension is vortexed during the addition of these solutions. 
After 10 minutes, the tubes are briefly centrifuged (5 sec at 15.000 rpm) and the supernatant 
removed. The particles are resuspended in 200 \xL of absolute ethanol. centrifuged again 
and the supernatant removed. The ethanol rinse is performed again and the particles 
resuspended in a final volume of 30 ]iL of ethanol. An aliquot (5 jjL) of the DNA-coated 

15 gold panicles can be placed in the center of a Kapton™ flying disc (Bio-Rad Labs). The 

particles are then accelerated into the corn tissue with a Biolistic™ PDS-1000/He (Bio-Rad 
Instruments. Hercules CA), using a helium pressure of 1000 psi. a gap distance of 0.5 cm 
and a flying distance of 1 .0 cm. 

For bombardment, the embryogenic tissue is placed on filter paper over agarose- 

20 solidified N6 medium. The tissue is arranged as a thin lawn and covered a circular area of 
about 5 cm in diameter. The petri dish containing the tissue can be placed in the chamber of 
the PDS-1 000/He approximately 8 cm from the stopping screen. The air in the chamber is 
then evacuated to a vacuum of 28 inches of Hg. The macrocarrier is accelerated with a 
helium shock wave using a rupture membrane that bursts when the He pressure in the shock 

25 tube reaches 1 000 psi. 

Seven days after bombardment the tissue can be transferred to N6 medium that 
contains gluphosinate (2 mg per liter) and lacks casein or proline. The tissue continues to 
grow slowly on this medium. After an additional 2 weeks the tissue can be transferred to 
fresh N6 medium containing gluphosinate. After 6 weeks, areas of about 1 cm in diameter 

30 of actively growing callus can be identified on some of the plates containing the glufosinate- 
supplemented medium. These calli may continue to grow when sub-cultured on the 
selective medium. 

Plants can be regenerated from the transgenic callus by first transferring clusters of 
tissue to N6 medium supplemented with 0.2 mg per liter of 2,4-D. After two weeks the 
35 tissue can be transferred to regeneration medium (Fromm et al.. (1990) Bio/Technology^ 
5:833-839). 
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EXAMPLE 7 
Expression of Chimeric Genes in Dicot Cells 
A seed-specific expression cassette composed of the promoter and transcription 
terminator from the gene encoding the P subunit of the seed storage protein phaseolin froiji ** 
5 the bean Phaseoius vulgaris (Doyle et al. (1986) J. BioL.Chem. 26] :9228-9238) can be used ' 
for expression of the instant carotenoid biosynthetic enzyme in transformed soybean. The 
. phaseolin cassette includes about 500 nucleotides upstream (5') from the translation initiation- 
codon and about 1650 nucleotides downstream (3') from the translation stop codon of 
phaseolin. Between the 5' and 3' regions are the unique restriction endonuclease sites Nco 1 
10 (which includes the ATG translation initiation codon), Sma I. Kpn 1 and Xba 1. The entire . 
cassette is flanked by Hind 111 sites. 

The cDNA fragment of this gene may be generated by polymerase chain, reaction 
(PCR) of the cDNA clone using appropriate oligonucleotide primers. Cloning sites can be 
incorporated into the oligonucleotides to provide proper orientation of the DNA fragment 
15 when inserted into the expression vector. Amplification is then performed as described 
above, and the isolated fragment is inserted into a pUCl 8 vector carrying the seed 
expression cassette. 

Soybean embroys may then be transformed with the expression vector comprising 
sequences encoding a carotenoid biosynthetic enzyme. To induce somatic embryos, 

20 cotyledons, 3-5 mm in length dissected from surface sterilized, immature seeds of the 

soybean cultivar A2872, can be cultured in the light or dark at 26°C on an appropriate agar 
medium for 6-10 weeks. Somatic embryos which produce secondary embryos are then 
excised and placed into a suitable liquid medium. After repeated selection for clusters of 
somatic embryos which multiplied as early, globular staged embryos, the suspensions are 

25 maintained as described below. 

Soybean embryogenic suspension cultures can maintained in 35 mL liquid media on a . 
rotary shaker, 1 50 rpm. at 26°C with florescent lights on a 16:8 hour day/night schedule. 
Cultures are subcultured every two weeks by inoculating approximately 35 mg of tissue into 
35 mL of liquid medium. 

30 Soybean embryogenic suspension cultures may then be transformed by the method of 

particle gun bombardment (Klein T. M. et al. (1987) Nature (London) 327:70-73, U.S. 
Patent No. 4,945,050). A DuPont Biolistic™ PDS1000/HE instrument (helium retrofit) can 
be used for these transformations. 

A selectable marker gene which can be used to facilitate soybean transformation is a 

35 chimeric gene composed of the 35S promoter from Cauliflower Mosaic Virus (Odell et al. 

(1985) Nature 373:810-812), the hygromycin phosphotransferase gene from plasmid pJR225 
(from E. coli: Gritz et al. (1983) Gene 25:179-188) and the 3' region of the nopaline synthase 
gene from the T-DNA of the Ti plasmid of Agrobacterium tumefaciens. The seed expression 
cassette comprising the phaseolin 5 1 region, the fragment encoding the carotenoid 
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biosynthetic enzyme and ihe phaseolin 3' region can be isolated a^T restriction fragment. 
This fragment can then be inserted into a. unique restriction site of the vector carrying the 
marker gene. 

To 50 jaL of a 60 mg/mL 1 |am gold particle suspension is added (in order): 5 
5 DNA (1 lag/^L), 20 jil spermidine (0.1 M), and 50 jaL CaCl 2 (2.5 M). The particle 

preparation is then agitated for three minutes, spun in a microfuge for 10 seconds and the 
supernatant removed. The DNA-coated particles are then washed once in 400 70% 
ethanol and resuspended in 40 uL of anhydrous ethanol. The DN A/particle suspension can 
be sonicated three times for one second each. Five |aL of the DNA-coated gold particles are 
10 then loaded on each macro carrier disk. 

Approximately 300-400 mg of a two-week-old suspension culture is placed in an 
empty 60x15 mm petri dish and the residual liquid removed from the tissue with a pipette. 
For each transformation experiment, approximately 5-10 plates of tissue are normally 
bombarded. Membrane rupture pressure is set at 1 100 psi and the chamber is evacuated to a 
15 vacuum of 28 inches mercury. The tissue is placed approximately 3.5 inches away from the 
retaining screen and bombarded three times. Following bombardment, the tissue can be 
divided in half and placed back into liquid and cultured as described above. 

Five to seven days post bombardment, the liquid media may be exchanged with fresh 
media, and eleven to twelve days post bombardment with fresh media containing 50 mg/mL 
20 hygromycin. This selective media can be refreshed weekly. Seven to eight weeks post 
bombardment, green, transformed tissue may be observed growing from untransforrned. 
necrotic embryogenic clusters. Isolated green tissue is removed and inoculated into 
individual flasks to generate new. clonally propagated, transformed embryogenic suspension 
cultures. Each new line may be treated as an independent transformation event. These 
25 suspensions can then be subcultured and maintained as clusters of immature embryos or 

regenerated into* whole plants by maturation and germination of individual somatic embryos. 

EXAMPLE 8 
Expression of Chimeric Genes in Microbial Cells 
The cDNAs encoding the instant carotenoid biosynthetic enzymes can be inserted into 
30 the T7 E. coli expression vector pBT430. This vector is a derivative of pET-3a (Rosenberg 
et al. (1987) Gene 56:125-135) which employs the bacteriophage T7 RNA polymerase/T7 
promoter system. Plasmid pBT430 was constructed by first destroying the EcoR 1 and 
Hind 111 sites in pET-3a at their original positions. An oligonucleotide adaptor containing 
EcoR 1 and Hind 111 sites was inserted at the BamH I site of pET-3a. This created pET-3aM 
35 with additional unique cloning sites for insertion of genes into the expression vector. Then, 
the Nde 1 site at the position of translation initiation was converted to an Nco 1 site using 
oligonucleotide-directed mutagenesis. The DNA sequence of pET-3aM in this region. 
5-CATATGG. was converted to 5'-CCCATGG in pBT430. 
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Plasmid DNA containing a cDNA may be appropriately digested to release a nucleic 
acid fragment encoding the protein. This fragment may then be purified on a 1% NuSieve. 
GTG™ low melting agarose gel (FMC). Buffer and agarose contain 10 ^lg/ml ethidium 
bromide for visualization of the DNA fragment. The fragment can then be purified from the 
5 agarose gel by digestion with GELase™ (Epicentre Technologies) acco.rding to the 

manufacturer's instructions, ethanoi precipitated, dried and resuspended in 20 of water. 
Appropriate oligonucleotide adapters may be ligated to the fragment using T4 DNA ligase 
(New England Biolabs, Beverly, MA). The fragment containing the ligated adapters can be 
purified from the excess adapters using low melting agarose as described above. The vector 

10 pBT430 is digested, dephosphorylated with alkaline phosphatase (NEB) and deproteinized 
with phenol?chloroform as described above. The prepared vector pBT430 and fragment can 
then be ligated at 16°C for 15 hours followed by transformation into DH5 electrocompetent 
cells (G1BCO BRL). Transformants can be selected on agar plates containing LB media and 
100 jag/mL ampicillin. Transformants containing the gene encoding the carotenoid 

15 biosynthetic enzyme ar^ then screened for the correct orientation, with respect to the T7 
promoter by restriction enzyme analysis. 

For high level expression, a plasmid clone with the cDNA insert in the correct 
orientation relative to the T7 promoter can be transformed into E. coli strain BL21(DE3) 
(Studier et al. (1986) J. Mol Biol. J89A 13-130). Cultures are grown in LB medium 

20 containing ampicillin (100 mg/L) at 25°C. At an optical density at 600 nm of approximately 
1, 1PTG (isopropylthio-p-galacioside. the inducer) can be added to a final concentration of 
0.4 mM and incubation can be continued for 3 h at 25°. Cells are then harvested by 
centrifugation and re-suspended in 50 ^L of 50 mM Tris-HCl at pH 8.0 containing 0.1 mM 
DTT and 0.2 mM phenyl methylsulfonyl fluoride. A small amount of 1 mm glass beads can 

25 be added and the mixture sonicated 3 times for about 5 seconds each time with a microprobe 
sonicator. The mixture is centrifuged and the protein concentration of the supernatant 
determined. One \xg of protein from the soluble fraction of the culture can be separated by 
SDS-polyacrylamide gel electrophoresis. Gels can be observed for protein bands migrating 
at the expected molecular weight. 

30 EXAMPLE 9 

Evaluating Compounds for Their Ability to Inhibit the Activity 
of Carotenoid Biosvnthetic Enzymes 
The carotenoid biosynthetic enzymes described herein may be produced using any 
number of methods known to those skilled in the art. Such methods include, but are not 

35 limited to, expression in bacteria as described in Example 8. or expression in eukaryotic cell 
culture, in planta, and using viral expression systems in suitably infected organisms or cell 
lines. The instant carotenoid biosynthetic enzymes may be expressed either as mature forms 
of the proteins as observed in vivo or as fusion proteins by covalent attachment to a variety 
of enzymes, proteins or affinity tags. Common fusion protein partners include glutathione 
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S-transferase ( 4fc GST ?? )- ihioredoxin ("Trx"), maltose binding protein, and C- and/or 
N-terminal hexahistidine polypeptide ( 4t (His) 6 ? ')- The fusion proteins may be engineered 
with a protease recognition site at the fusion point so that fusion partners can be separated-by 
protease digestion to yield intact mature enzyme. Examples of such proteases include # 
thrombin, enterokinase and factor Xa. However, any protease can be used which specifically 
cleaves the peptide connecting the fusion protein and the enzyme. 

Purification of the instant carotenoid biosynthetic enzymes, if desired, may utilize any 
number of separation technologies familiar to those skilled in the art of protein purification. 
Examples of such methods include, but are not limited to. homogenization. filtration, 
centrifugation, heat denaturation, ammonium sulfate precipitation, desalting. pH 
precipitation, ion exchange chromatography, hydrophobic interaction chromatography and 
affinity chromatography, wherein the affinity ligand represents a substrate, substrate analog 
or inhibitor. When the carotenoid biosynthetic enzymes are expressed as fusion proteins, the 
purification protocol may include the use of an affinity resin which is specific for the fusion 
protein tag attached to the expressed enzyme or an affinity resin containing ligands which 
are specific for the enzyme. For example, a carotenoid biosynthetic enzyme may be 
expressed as a fusion protein coupled to the C-terminus of thioredoxin. In addition, a (His) 6 
peptide may be engineered into the N-terminus of the fused thioredoxin moiety to afford 
additional opportunities for affinity purification. Other suitable affinity resins could be 
synthesized by linking the appropriate ligands to any suitable resin such as Sepharose-4B. In 
an alternate embodiment, a thioredoxin fusion protein may be eluted using dithiothreitol; 
however, elution may be accomplished using other reagents which interact to displace the 
thioredoxin from the resin. These reagents include P-mercaptoethanol or other reduced 
thiol. The eluted fusion protein may be subjected to further purification by traditional means 
as stated above, if desired. Proteolytic cleavage of the thioredoxin fusion protein and the 
enzyme may be accomplished after the fusion protein is purified or while the protein is still 
bound to the ThioBond™ affinity resin or other resin. 

Crude, partially purified or purified enzyme, either alone or as a fusion protein, may be 
utilized in assays for the evaluation of compounds for their ability to inhibit enzymatic 
activation of the carotenoid biosynthetic enzymes disclosed herein. Assays may be 
conducted under well known experimental conditions which permit optimal enzymatic 
activity. For example, assays for beta carotene hydroxylase are presented by Sun Z. et al. 
(1996) J. Biol. Chem. 277:24349-24352. Assays for lycopene cyclase and lycopene epsilon 
cyclase are presented by Cunningham F. X. (1994) Plan! Cell 6 A 107-1 121. 
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CLAIMS 

What is claimed is: 

1. • An isolated nucleic acid fragment encoding all or a substantial portion of a 
lycopene epsilon cyclase comprising a member selected from the group consisting of: 

5 (a) an isolated nucleic acid fragment encoding all or a substantial portion of 

the amino acid sequence set forth in a member selected from the group 
consisting of SEQ 1 D NO: 1 6, SEQ ID NO: 1 8 and SEQ ID NO:20; . 

(b) an isolated nucleic acid fragment that is substantially similar to an 
isolated nucleic acid fragment encoding all or a substantial portion of 

10 the amino acid sequence set forth in a member selected from the group 

consisting of SEQ ID NO: 3 6, SEQ ID NO: 1 8 and SEQ ID NO:20; and 

(c) an isolated nucleic acid fragment that is complementary to (a) or (b). 

2. The isolated nucleic acid fragment of Claim 1 wherein the nucleotide sequence 
of the fragment comprises all or a portion of the sequence set forth in a member selected 

15 from the group consisting of SEQ ID NO: 15, SEQ ID NO: 17 and SEQ ID NO: 19. ' 

3. A chimeric gene comprising the nucleic acid fragment of Claim 1 operably 
linked to suitable regulatory sequences. 

4. A transformed host cell comprising the chimeric gene of Claim 3. 

5. A lycopene epsilon cyclase polypeptide comprising all or a substantial portion 
20 of the amino acid sequence set forth in a member selected from the group consisting of SEQ 

ID NO:16 ? SEQ ID NO:l 8 and SEQ ID NO:20. 

6. An isolated nucleic acid fragment encoding all or a substantial portion of a beta 
carotene hydroxylase comprising a member selected from the group consisting of: 

(a) an isolated nucleic acid fragment encoding all or a substantial portion of 
25 , ■■ the amino acid sequence set forth in a member selected from the group 

consisting of SEQ ID NO:2. SEQ ID NO:4. SEQ ID NO:6 and SEQ ID 
NO:8; 

(b) an isolated nucleic acid fragment that is substantially similar to an 
isolated nucleic acid fragment encoding all or a substantial portion of 

30 the amino acid sequence set forth in a member selected from the group 

consisting of SEQ ID NO:2 ? SEQ ID NO:4 ; SEQ ID NO:6 and SEQ ID 
NO:8: and 

(c) an isolated nucleic acid fragment that is complementary to (a) or (b). 

7. The isolated nucleic acid fragment of Claim 6 wherein the nucleotide sequence 
35 of the fragment comprises all or a portion of the sequence set forth in a member selected 

from the group consisting of SEQ ID NO: 1 , SEQ ID NO:3 ? SEQ ID NO:5 and SEQ ID 
NO:7. 

8. A chimeric gene comprising the nucleic acid fragment of Claim 6 operably 
linked to suitable regulatory sequences. 
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9. A transformed host cell comprising the chimeric gene of Claim 8. 
10. A beta carotene hydroxylase polypeptide comprising all or a substantial portion 
of the amino acid sequence set forth in a member selected from the group consisting of SEQ 
ID NO:2. SEQ ID NO:4 ? SEQ ID NO:6 and SEQ ID NO:8. 
5 11. An isolated nucleic acid fragment encoding all or a substantial portion of a 

lycopene cyclase comprising a member selected from the group consisting of: 

(a) an isolated nucleic acid fragment encoding all or a substantial portion of 
the amino acid sequence set forth in a member selected from the group, 
consisting of SEQ IDNO:10 ? SEQ ID NO: 12 and SEQ ID NO: 14; 
10 (b) an isolated nucleic acid fragment that is substantially similar to an 

isolated nucleic acid fragment encoding all or a substantial portion of 
the amino acid sequence set forth in a member selected from the group 
consisting of SEQ ID NO:10 ; SEQ ID NO: 12 and SEQ ID NO: 14; and 
(c) an isolated nucleic acid fragment that is complementary to (a) or (b). 
15 12. The isolated nucleic acid fragment of Claim 1 1 wherein the nucleotide 

sequence of the fragment comprises all or a portion of the sequence set forth in a member 
selected from the group consisting of SEQ ID NO:9 ? SEQ ID NO:l 1 and SEQ ID NO: 13. 

13 A chimeric gene comprising the nucleic acid fragment of Claim 1 1 operably 
linked to suitable regulatory sequences. 
20 14. A transformed host cell comprising the chimeric gene of Claim 13. 

15. A lycopene cyclase polypeptide comprising all or a substantial portion of the 
amino acid sequence set forth in a member selected from the group consisting of SEQ ID 
NO:10. SEQ ID NO:12 and SEQ ID NO:14. 

16. A method of altering the level of expression of a carotenoid biosynthetic 
25 enzyme in a host cell comprising: 

(a) transforming a host cell with the chimeric gene of any of Claims 3. 8 
and 13; and 

(b) growing the transformed host cell produced in step (a) under conditions 
that are suitable for expression of the chimeric gene 

30 wherein expression of the chimeric gene results in production of altered levels of a 
carotenoid biosynthetic enzyme in the transformed host cell. 

17. A method of obtaining a nucleic acid fragment encoding all or a substantial 
portion of the amino acid sequence encoding a carotenoid biosynthetic enzyme comprising: 

(a) probing a cDNA or genomic library with the nucleic acid fragment of 
35 any of Claims 1 . 6 and 1 1 ; 

(b) identifying a DNA clone that hybridizes with the nucleic acid fragment 
of any of Claims 1 , 6 and 1 1 ; 

(c) isolating the DNA clone identified in step (b): and 
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(d) sequencing the cDNA or genomic fragment that comprises the clone 
isolated in step (c) 

wherein the sequenced nucleic acid fragment encodes all or a substantial portion of the 
amino acid sequence encoding a carotenoid biosynthetic enzyme. 
5 1 8. A method of obtaining a nucleic acid fragment encoding a substantial portion 

of an amino acid sequence encoding a carotenoid biosynthetic enzyme comprising: 

(a) synthesizing an oligonucleotide primer corresponding to a portion of 
the sequence set forth in any of SEQ ID NOs:K 3, 5, 7, 9, 1 1, 13, 15, 1-7 
and 1 9 and 

10 (b) amplifying a cDNA insert present in a cloning vector using the 

oligonucleotide primer of step (a) and a primer representing sequences 
of the cloning vector 

wherein the amplified nucleic acid fragment encodes a substantial portion of an amino acid 
sequence encoding a carotenoid biosynthetic enzymes. 
15 19. The product of the method of Claim 17. . 

20. The product of the method of Claim 18. 
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<i;LO> E. , I. cu Pont De Nemours and Company 
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<130> BE-1115 
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<140> 
<141> 

<150> 60/083,042 

<15i> April 24, 1998 

<160> 24 

<170> ' Microsoft Office 97 



<210> 
<211> 
<212> 
<213> 



1 

817 
DNA 

Zea mavs 



<400> I 

tctagcct eg 

at ggagggcg 

geggeggteg 

ctgt ggcaca 

gtgttcgcca 

cgeggcat eg 

gcctacarct 

gacgtgccct 

ggcggcgtcc 

gacgagctt g 

aagacgcgt c 

aatgagttct 

taagtactat 

caccatcttc 



gact caegtc 
gccaggtgcc 
ggatggagtt 
t gcacgagt c 
t cgtcaacgc 
igcccggcct 
tcctccacga 
actt ccgccg 
cgt at gggct 
t tagcagt cc 
cagt tgtatg 
agcttt aggc 
gagt ct gtaa 
ctt tcaaaaa 



cat ggccgt c 
ggtgatcgag 
ctgggcgcgg 
gcaccaccgg 
cgcgccggca 
ct gctt egge 
cggcctggt c 
agt ggct gec 
ct t cct ggga 
ggtgagt gaa 
cgttgt ccga 
gagtgggcca 
taatcaggac 
aaaaaaaaaa 



gccgccgt gt 
aeget gggca 
rgggcgcacc 
ccgcgcgagg 
atct ccct cc 
gcgggcctgg 
caccgccgct 
t cgcacaaga 
ccaaaggagc 
get act gata 
acaagegt gt 
aatgagtgtg 
aacaagaa tt 
a a a a a a a 



actat cget t 
cgttcgcgct 
gggcgctgtg 
.gccccttcga 
tcgcctacgg 
ggattacget 
ttccggtcgc 
tacaccacat 
t ggaggaggt 
ct gaagaege 
t cat gggeca 
ggctgcgt cc 
aggt aaagt a 



cagctggcaa 
ct ccgt eggg 
gcacgcctcc 
get caacgac 
ct t ct t caac 
gt teggcatjg' 
ccccat c0cc 
ggacaa/gtafc 
t ggt ggcctg 
aggagaagaa 
gagtgtgcca 
caaggt agta 
atacatctat 



60 
120 
180 
2 4 0. 

>cfo 

360 
420 
480 
540 
600 
660 
720 
780 
817 



<210>" 2 

<211> 223 

<212> PRT 

<213> Zea mays 

<400> 2 

Ser Ser Leu Gly Val Thr Ser Met Ala Val Ala Ala Val Tyr Tyr Arg 
lb 10 15 

Phe Ser Trp Gin Met Glu Gly Gly Glu Val Pro Val lie Glu Thr Leu 
20 25 30 

Gly Thr Phe Ala Leu Ser Val Gly Ala Ala Val Gly Met Glu Phe Trp 
35 4 0 4 5 

Ala Arc Trp Ala His Arg Ala Leu Trp His Ala Ser Leu Trp His Met 
50 55 60 



His Glu Ser His His Arg Pro Arg Glu Gly Pro Phe Glu Leu Asn Asp 
65 70 * 75 80 
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Val Phe Ala lie Val Asn Ala Ala Pro Ala lie Ser Leu Leu Ala Tyr 

8 5 90 9 5 

i 

Gly Phe Phe Asn Arc Gly He Val Pre Gly Leu Cys Phe Gly Ala Giy 
100 105 110 

Leu Gly He Thr Leu Phe Gly Met Ala Tyr Met Phe Val His Asp Gly 
115 120 125 

Leu Val His Arg Arg Phe Pro Val Gly Pro lie Ala Asp Val Pro Tyr 
130 135 140 

Phe Arg Arg Val Ala Ala Ser His Lys lie His His Met Asp Lys Phe 
145 ; " 150 155 160 

Gly Gly Val Pro Tyr Gly Leu Phe Leu Gly Pro Lys Glu Leu Glu Glu 
165 ~ 170 175 

Val Gly Gly Leu Asp Glu Leu Val Ser Ser Pre Val Ser Glu Ala Thr 
ISO 185 190 

Asp Thr Glu Asp Ala Gly Glu Glu' Lys Thr Arg Pro Val Val Cys Val 
195 200 205 

Val Arg Thr Ser Val Phe Met Gly Gin Ser Val Pro Asn Glu Phe 
210 215 220 

<210> 3 

<211> 482 

<212> DNA 

<213> Oryza sativa 

<220> 

<221> unsure 
<222> ' (15) 

<400> 3 

ttcgccaacg tcccntectt ccggcgggrc gccgccgccc accagatace tcacatgcac 60 

aagttcgaag gtgtgccata tcggctgttc cttggtccca aggagctgga ggaggtgggt 120 

gggattgagg agctggaqaa ggagatcaag agcacgatta agaggaaaga gaccttagat 180 

gcgatccaat cagagctact tatcacacat ttttttcttt ttatttgctg tcatctctta 240 

tt agtttttg cttcagiggc aattgtagcc ttctaatatg cttttctgtc ctgttaatag -300 

aagaataatt tacataaata tttggaacct catttggacg grggagcaca caggtattag 360 

aaatgatgtg ctegggctga atggctgatg ettttgetag tactgctggt aatgatatat 420 

attaatcagt accattctga atttattaaa aaaaaaaaaa aaaaaaaaaa actcgatgta 4 80 

ag 482 

<210> 4 

<211> 63 

<212> PRT 

<213> Oryza sativa 

<40p> 4 

Phe Ala Asn Val Pro Tyr Phe Arg Arg Val Ala Ala Ala His Gin lie 
.1 5 10 15 

His His Met Asp Lys Phe Glu Gly Val Pro Tyr Gly- Leu Phe Leu Gly 
20 25 -30 

Pro Lys Glu Leu Glu Glu Val Gly Gly lie Glu Glu - Leu Glu Lys Glu 
35 40 45 



2 



WO 99/$! hb', 
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He Lys Arg Arg He Lys Arc Lys Glu, Thr. Leu Asp Ala lie Gin 
5 0 " 55 • 60 



<210> 
<211> 
<212> 
<213> 



582 
DNA 

Oryza sativa 



<220> 
<221> 
<222> 



unsure 
(428) 



<220> 
<221> 
<222> 



unsure 
(569) . . (570) 



<220> 
<221> 
<222> 



unsure 
{580) 



<400> 5 
t acsQCocaa 
gacccaaggt 
car gacaaag 
gt gcgtgcag 
aat caaccgg 
gagct agcag 
tcatteattg 
atgtgaangg 
aaatt t aaga 
ctcatcccot 



gatacaccac 
gcgtgactgg 
caatcctggt 
gagctggagg 
acctt£tgat 
a.at tgcctt t 
t cggtaccgg 
tgaagacaca 
gat gt cgtgg 
ct gtagacac 



accgcCccCt 
tqcctatGca 
ttccttqtec 
aggt t get eg 
1 1 tgacggcg 
gccttgcaac 

6G6C53CCCG 

gt aat t gt 1 1 
at ggt at ct t 
aat accttnn 



t egagggegt 
tttgtcgt ct 
cgt ctt ctcg 
tctggaagag 
geaattgeag 
attttttgta 
acccagaagc 
at tgt ggct g 
aat t cacgeg 
aacat gt aan 



accgt at ggg 
at atgt get t 
atetcaaegt 
ct ggagaagg 
ctagcaagac 
tt tct ggccg 
ggaacget t c 
t ggat cat eg 
caactttttc 
tt 



ct gttt ct t g 
gtgaccagta 
gtgt gt ggt c 
agettgegag 
t gcgcctgtc 
ctt egtcgae 
gagtat acac 
t cctgcgct g 
gt gaact t gt 



,60 
120 
180 
240 
30C 
•360 
420 
480 
540 
582 



<210> 6 

<211> 48 

<212> PRT 

<213> Oryza sativa 

<400> 6 

Ala Arc Val, (Sin His Lvs He His His Thr Asp Lys Phe Glu Gly Val 
1.5 10 15 

Pro Tvr Gly Leu Phe Leu Gly Pre Lys Val Glu Leu Glu Glu Val Gly 
20 2 5 30 

Gly Leu Glu Glu Leu Glu Lys Glu Leu Ala Arg He Asn Arg Ser Leu 
35 4 0 4 5 



<210> 
<211> 
<212> 
<213> 



604 
DNA 

Glycine max 



<220> 
<221> 
<222> 



unsure 
(453) 



<220> 
<221> 
<222> 



unsure 
(456) 



3 
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<220> 

<221> unsure 

<222> (492) 

<220> 

<221> unsure 

<222> (494) 



<220> 

<221> unsure 
<222> (504) 



<220> 

<221> unsure 

<222> (522) 

<220> 

<221> unsure* 

<222> (527) 

<220> 

<221> unsure. 

<222> (565) 

<220> 

<221> unsure 

<222> (569) 

<220> 

<221> unsure 

<222> (599) 



<220> 
<221> 
<222> 



unsure 
( 604 ) 



<400> 7 

acaatctcac 
tccgccgcca 
aaatcaatcc 
ccaagaaccc 
gcacccacat 
caaacaat t c 
cact t acct t 
tgt taatcca 
tggcacattt 
ggcn 



ctctcatct c 
at^g.ttctca 
taa* ccatgaa 
caacaacact 
aaaaagt tea 
ggaaat tgag 
t gt caacaaa 
gt tgectget 
anantct cgt 
get caacagt 



tctct ct etc 
agttt ca aca 
gt ccct cct c 
ccctt t ct cc 
acttt caccc 
ccacaagaac 
agttgggt ga 
gt caagt ct a 
gggnaaat gg 
tggahct ccg 



t ct ct ct aag 
ct tt ctagtt 
cgt tt ccacc 
cccat gagaa 
t ttgtgtcct 
aacctcctcc 
gaaat tgggc 
gent cnggat 
agggt ggaga 
gtgggtatgg 



acacacaaac 
ct accat ggc 
aacctcacct 
1 1 1 1 ccacca 
cetgeaggac 
t cctcct cct 
aagaaaagat 
cacat ccaac 
ant geent gg 
aatttggcna 



act acacat c 
ggtaggact c 
gaacct cccc 
cacagcatca 
ccaaaacaag 
cctccccgct 
ccgaagaggt. 
gcaatgt cgc 
t ctgaatgt t 
at gggecana 



60 
120 
180 
240 
300 
360 
4 20 
480 
540 
600 
604 



<210> 8 

<211> 314 

<212> PRT 

<213> Glycine max' 

<400> 8 

Met Ala Val Gly Leu Ser Ala Ala lie Thr Met Lys Ser Leu Leu Arg 
1 " 5 10 15 



Phe His Gin Pro His Leu Asn Leu Pre Lys Ser lie Pro Thr Thr Leu 
20 25 30 
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Pre Phe Ser Pro Met Arg He Phe His His Thr Ala Ser Pro Arg Thr 
35 40 45 

« , 

Gin Lys Val Ser Thr Phe Thr Val Cys Val Leu Met Gin Asp Pro Lys 
50 55 60 

Gin Glv Thr His Met Giu He Giu Pro Gin Glu Gin Pro Pro Pro Pro 
65 " 70 75 80 

Pro Pro Pre Pro Ala Gin Gin Val Leu Ser Pro Lys Leu Ala Glu Lys 

85 90 95 

Leu Ala Arg Lys Glu Ser Glu Arg Phe Thr Tyr Leu Val Ala Ala Val 
100 ' 105 110 

Met, Ser Ser Phe Giy lie Thr Ser Met Ala Val . Phe Ala Val Tyr Cys 
115 120 225 

Arc Phe Ser Trp Gin Met Glu Giy Gly Glu Val Pro Trp Ser Glu Met 
130 135 14 0 

Phe Gly Thr Phe Ala Leu Ser Vai Gly Ala Ala Val Giy Met Glu Phe 
145 150 155 160 

Trp Ala Arc Tro Ala His Arg Ala Leu Trp His Ala Ser Leu Trp His 
165 170 175 

Met His Glu Ser His His Arg Pro Arg Glu Gly Pro Phe Glu Leu Asn 
180 185 190 

Asp Val Phe Ala He lie Asn Ala Val Pro Ala He Ala Leu Leu Ser 
195 200 205 

Tyr Gly Phe Phe Asn Lys Giy Leu Val Pro Gly Leu Cys Phe Gly Ala 
210 215 220 

Gly Leu Gly He Thr Vai Phe Gly Met Ala Tyr Met Phe Val His Asp 
225 230 235 240 

Glv Leu Val His Lvs Arc Phe Pro Val Gly Pre He Ala Asn Val Pro 
245 250 255 

Tyr Leu Arg Arg Val Ala Ser Ala His Gin Leu His His Ser Glu Lys 
260 265 270 

Phe Asn Gly Val Pre Tyr Giy Leu Phe Leu Gly Pro Lys Glu He Glu 
275 280 285 

Glu Val Gly Gly Leu Glu Glu Leu Glu Lys Glu lie Ser Arg Arg Ala 
290 295 300 

Arg Ser Tyr Lys He Ala Asn Giu Asn Asn 
30 5 310 

<210> 10 

<211> 314 

<212> PRT 

<213> Glycine max 
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<400> 10 ,* ' 

Met Ala Val Gly Leu Ser Ala Ala He Thr Met Lys Ser Leu Leu Arg 
15 10 15 

Phe His Gin Pro His Leu Asn Leu Pro Lys .Ser He Pro Thr- Thr Leu 
20 25 30 

Pro Phe Ser Pro Met Arc He Phe His His Thr Ala Ser Pro Arg Thr 
35 40 45 

Gin Lys Val Ser Thr Phe Thr Val Cys Val Leu Met Gin Asp Pro Lys 
50 55 60 

Gin Gly Thr His Met Glu He Glu Pro Gin Glu Gin Pro Pro Pro Pro 
65 - 70 75 80 

Pro Pro Pro Pro Ala Gin Gin Val Leu Ser Pre Lys Leu Ala Glu Lys 
85 90 . 95 

Leu Ala Arg Lys Glu Ser Glu Arg Phe Thr Tyr Leu Val Ala Ala Val 
. 100 105 . • 110 

Met Ser Ser Phe Gly He Thr Ser Met Ala Val Phe Ala Val Tyr Cys 
115 120 125 

Arg Phe Ser Trp Gin, Met Glu Gly Gly Glu- Val Pro Trp Ser Glu Met 
130 / 135 . 140 

Phe Gly Thr Phe Ala Leu Ser Val Gly Ala Al.a Val Gly Met Glu Phe 
145 150 155 160 

Trp Ala Arg Trp Ala His Arg Ala Leu Trp His Ala Ser Leu trp His 
165 170 175 

Met His Glu Ser His His Arg Pro: Arg Glu Gly Pro Phe Glu Leu Asn 
180 185 190' 

Asp Val Phe Ala lie He Asn Ala Val Pro Ala lie Aia Leu Leu Ser 
195 ■ " 200 205 . . . 

Tyr Gly Phe Phe Asn Lys Gly Leu Val Pro Gly Leu Cys Phe Gly Aia 
210 215 * 220 

Gly Leu Gly lie Thr Val Phe Gly Met Aia Tyr Met Phe Val His Asp 
225 230 235 240 

Gly Leu Val His Lys Arg Phe Pro Val Gly Pro He Ala Asn Val Pro 
245 250 255 

Tyr Leu Arg Arg Val Ala Ser Ala His Gin Leu His His Ser Glu Lys 
260 265 270 

Phe Asn Gly Val Pro Tyr Gly Leu Phe Leu Gly Pro Lys Glu He Glu 
275 280 285 

Glu Val Gly Gly Leu Glu Glu Leu Glu Lys Glu lie Ser Arg Arg Ala 
290 295 300 

Arg Ser Tyr Lys lie Ala Asn Glu Asn Asn 
305 310 
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<210> 12 

<211> 254 

<212> FRT 

<213> Glycine max 

<400> 12 

Met Gin Asp Pro Lys Gin Glv Thr His Met Glu lie Glu Pro Gin Glu 
1 5 10 15 

Gin Pro Pro Pro Pro Pro Pro Pro Ala Gin Gin Val Leu Ser Pro Lys 
20 25 30 

Leu Ala Glu Lys Leu Ala Arg Lys Glu Ser Glu Arg Phe Thr Tyr Leu 
3 5 4 0 4 5 

Val Ala Ala Val Met Ser Ser Phe Gly lie Thr. Ser Met Ala Val Phe 
50 55 60 

Ala Val Tvr Cys Arg Phe Ser Trp Gin Met Glu Gly Gly Glu Val Pro 
65 7 0 7 5 8 0 

Trp Ser Glu Met Phe Gly Thr Phe Ala Leu Ser Val Gly Ala Ala Val 

85 " 90 95 

Gly Met Glu Phe Trp Ala Arc Trp Ala His Arg Ala Leu Trp His Ala 
100 105 " 110 

Ser Leu Trp His Met His Glu Ser His His Arg Pro Arg Glu Gly Pro 
11*5 120 125 

Phe Glu Leu Asn Asp Val Phe Ala lie lie Asn Ala Val Pro Ala He 
130 135 140 

Ala Leu Leu Ser Tyr Gly Phe Phe Asn Lys Gly Leu Val Pro Gly Leu 
145 " 150 155 160 

Cys Phe Glv Ala Gly Leu Gly lie Thr Val Phe Gly Met Ala Tyr Met 
165 " 170 175 

Phe Val His Asp Gly Leu Val His Lys Arg Phe Pro Val Gly Pro He 
160 185 190 

Ala Asn Val Pro Tyr Leu Arg Arg Val Ala Ser Ala His Gin Leu His 
195 " " 200 205 

His Ser Glu Lvs Phe Asn Gly Val Pro Tyr Gly Leu Phe Leu Gly Pro 
210 " 215 220 

Lys Glu lie Glu Glu Val Gly Gly Leu Glu Glu Leu Glu Lys Glu He 
225 230 235 240 

Ser Arq Arq Ala Arg Ser Tyr Lys lie Ala Asn Glu Asn Asn 
245 250 

<210> 13 
<211> 396 
<212> DNA 

< 2 1 3 > Triticum aestivum 
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<220> 

<221> unsure 
<222> (22*) 

<220> 

<221> unsure 
<222> (360) 

<400> 13 

gcaccgcgcc ctctcccacg cgtcgctgtg ggacatgcac gagtcccacc acctcccccg 60 

cgacgggccc rtcqagctca acgacgtgtt cgccatcgtc aacgccgtcc ccgccatggc 120 

cctcctcccc ttcggcttct tcaaccgcgg cctcctcccc ggcctctcct tcggcgccgg 180 

tctaggcatc acgctgttcg gqatcgccta catgttcgtc cacnacggcc tggtgcaccg 240 

ccacttcccc ciQGccccca tccagaacg'i gccctacttc cggcgagrcg ccgccgcgca 300 

ccatattcat cacatggaca agttcgacag cctgccatac gggctcttcc tcggccccaa 360 

gga.gctgcag gaggtggggn aggatcttct tctttt 396 

<210> 14 

<211> 131 

<212> PRT ' • 

<213> Triticum aestivum 

<220> 

<221> . UNSURE 

<222> (75) 

<220> . * 

<221> UNSURE ' . 

<222> (127) 

<4 00> 14 

His Arc Ala Leu Trp His Ala Ser Leu Tro Asp Met His Glu Ser His 
1 5 10 15 

His Leu Fro Arc Asp Gly Pre Phe Glu Leu Asn Asp Val Phe Ala lie 
20 25 30 

Val Asn Aia Val Pro Ala Met Ala Leu Leu Ala Phe Gly Phe Phe Asn 

3 5 - 40 .. 4 5 

Arg Gly Leu Leu Pro Gly Leu Cys Phe Gly Ala Gly Leu Gly lie Thr 
' 50 < 55 "60 

Leu Phe Giv Met Ala Tyr Met Phe Val His Xaa Gly Leu Val His Arg 
65 " 70 75 80 

Arg Phe Pro Val Gly Pro lie Glu Asn Val Pro Tyr Phe Arg Arg Val 

85 90 95 

Ala Ala Ala His His He His His Met Asp Lys Phe Asp Ser Val Pro 
100 105 110 

Tyr Gly Leu Phe Leu Gly Pro Lys Glu Leu Glu Glu Val Gly Xaa Asp 
115 " 120 125 

Leu Leu Leu 
130 

<210> 15 
<211>. 1358 
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<212> DNA 

< 2 1 3 > Zea mays 

<220> 

<221> unsure 

<222> (12) 



<220> 

<221> unsure . 

<222> (57) 

<220> 

<221> unsure 

<222> (160) 

<220> 

<221> unsure'"- 

<222> (169) , 

<220> 

<221> unsure 

<222> (193) 

<220> 

<221> unsure 

<222> (305) 

<220> 

<221> unsure 

<222> (324) 



<400> 15 

ggcgccgcca antcgctcga 

accatgetgg accgctgcgt 

gccctccact acaacgcctc 

agcgtcgtgc tcnacgccac 

aacccgggtt accagttcgc 

attcnaaaaa tgctcttcat 

agggagcgca accgccgcat 

atcttcctcg agga'gacgtc 

gagcgcatqc ccgcgcgcci 

gatcgctccg tcatccccat 

atcggcggca cggcagggat 

gccaccgcgc ctatcgtggc 

ggcatgggtg gcctggcagg 

gccaacaggc ggcggcagag 

gacctccagg gaacgcggcg 

cacggtttcc tgtcatccag 

ttcgggaacg cctccaactc 

ggcaagatga ttggcaactt 

catttttcac gtgaagatct 

actctattta cgatttgcaa 

ecctcctctc agatatagga 

gtctatagtt gcacttgcac 

gtcttcatat aaaagatcaa 



ccgcccctac gcccgcgtcg 
ccccaacggc gtcttcttcc 
ciccctcctc atctgtgacn 
gggcttctcg cgctgcctcg 
ctacggcatc ctcgccgaag 
ggantggcgc gactcgcacc 
ccccaccttc ctctacgcca 
cctggtcgcg cgcccggggc 
caggcacctg ggtatacgcg 
gggagggccg ctgcccgtcc 
gctccacccg tccacgggct 
ggacgccatc gtaaggttcc 
ggacgcgctc tccgccgagg 
cgagttcttc tgcttcggca 
gtt ct t cgac gccttctt eg 
aetgttcctg ccggagctct 
gtcgaggctg gagatcatgg 
gatacaggac agagatgggt 
catctccatt ggatctctga 
acatggattc acaaaacaca 
art get get g caacgctact 
aetgagggat gtcgtgagaa 
atttccaaca aaaaaaaa 



cgcgccgcaa gctcaantcc 60 

accaggccaa ggtcgccaag 120 

acggcgtcnc cgtcccggca 180 

tgeagtaega caagccgtac 240 

ttcgacgcca cccgttcgac 300 

tccccaaagg gteggaaate 360 

tgccccttct ccccaccagg 420 

tcgccatgga cgacatccag 480 

tecgaagegt cgaggaggac 540 

tgeegcagag ggttgtcggc 600 

acatggtggc gcgcacgct t 660 

tegacacegg caccggcaac 720 

tgtggaagca gctgtggcca 780 

tggaegtect gctcaagctg 840 

acctggagcc acactactgg 900 

tgatgttcgg cctcgcactg 960 

ccaagggcac cgtgcct ctt 102 0 

gaggagggta tgtataccta 1080 

ttttggtatc gatgattttc 1140 

gttagcaaca gcagttcagg 1200 

tcagtatggt gattacagag 1260 

tetaegtate agatatcatg 1320 

1358 



<210> 16 

<211> 353 

<212> PRT 

<21 3> Zea mays 
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<220> 

<22i> UNSURE 

<222> (4) 

<220> 

<221> UNSURE 

<222> (19) 

<220> 

<22i> UNSURE 

<222> (54) 

<220> 

<221> UNSURE 

<222> (57) • 



<220> 

<221> UNSURE* * 
<222> (65) ... . ... . 

<220> 

<22I> UNSURE . 
<222> (102) 

<220> 

<221> UNSURE 

<222> (108) , ' 
<400> 16 

Gly Gly Ala Xaa Ser Leu Asp Arc Pre Tyr Ala Arg Val Ala Arg Arg 
. 1 .5 10 15 

Lys Leu Xaa Ser Thr Met Met Asp Arg Cys Val Ala Asn Gly Val Phe . 
20 25 30 

Phe His Gin Ala Lys Val Ala Lys' Ala Val Hi's Tyr Asn Ala Ser Ser 
35 40 45 

Leu Leu lie CysAsp-Xaa Gly Val Xaa Val Pre ' Ala Ser Val Val Leu 
50 55 60 

Xaa Ala Thr Gly Phe Ser Arg Cys Leu Val Gin Tyr Asp Lys Pre Tyr 
65 70 75 80 

Asn Pro Gly Tyr Gin Phe Ala Tyr Gly lie Leu Ala Giu Val Arg Arg 

8 5 ' 90 95 

His Pro Phe Asp lie Xaa Lys Met Leu Phe Met Xaa Trp Arg Asp Ser 
100 105 110 

His Leu Pro Lys Gly Ser Glu lie Arg Glu Arg Asn Arc Arg lie Pro 
115 . 120 125 

Thr Phe Leu Tyr Ala Met Pro Leu Leu Pro Thr Arg lie Phe Leu Glu 
130 135 140 

Giu Thr Ser Leu Val Ala Arg Pro Gly Leu Ala Met Asp Asp lie Gin 
145 150 155 160 

Glu Arc Met Ala Ala Arg Leu Arg His Leu Gly lie Arg Val Arg Ser 
165 170 175 . 
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Val Glu Glu Asp Asd Arg Cys Val lie Pre Met Gly Gly Fro Leu Pro 
160 185 190 

Val Leu Pro Gin Arg Val Val Gly He Gly Gly Thr Ala Gly Met Val 
195 200 205 

His Pro Ser Thr Gly Tyr Met Val Ala Arc Thr Leu Ala Thr Ala Pro 
210 215 220 

He Val Ala Asp Ala He Val Arg Phe Leu Asp Thr Gly Thr Gly Asn 
225 230 235 240 

Glv Met Gly Gly Leu Ala Gly Asp Ala Leu Ser Ala Glu Val Trp Lys 
245 250 255 

Gin Leu Trp Pro Ala Asn Arg Arg Arg Gin Arg Glu Phe Phe Cys Phe 
260 265 270 

Glv Met Asp Val Leu Leu Lys Leu Asp Leu Glu Gly Thr Arg Arc Phe 
275 280 285 

Phe Asp Ala Phe Phe Asp Leu Glu Pro His Tyr Trp His Gly Phe Leu 
29*0 295 300 

Ser Ser Arg Leu Phe Leu Pro Glu Leu Leu Met Phe Gly Leu Ala Leu 
305 " 310 315 320 

Phe Gly Asn Ala Ser Asn Ser Ser Arg Leu Glu 'He Met Ala Lys Gly 
325 330 335 

Thr Val Pro Leu Gly Lys Met He Gly Asn Leu He Gin Asp Arg Asp 
34 0 34 5 350 

Gly 

<210> 17 

<211> 1731 

<212> h DNA 

<213> Glycine max 

<220> 

<221> unsure 
<222> (878) 

<400> 17 

gcacgagctt ccaattacca aagtctcttc actatagaca acacatttct gcacaatttt 60 

attgetaett ttctccatca cctaaaaaat qggcactagt ttcatgetat tttctccacc 120 

gcctctgctc aagcctcacc aagtacccct caccactcca ttccctcttc ctcaaaccca 180 

tcacacagca tcaagaaaca agagggtcca caccaccagc aaattcggaa actttcttga 240 

cttcaaaccc gagaacaaac ccgaatcctt agattttcac cttccatggt gccacccttc 300 

tgaccgcaat cgttttgatg tgatcatcat tggtgctcgc cctgcaggca caaggctcgc 360 

cgagcaagtg tccctctatg gagttaaggt ttgttgtgtt gatcctgacc ctctttctgt 420 

gtggcctaac aactatggtg tgtggagaga tgactttgag agccttggtc ttgaggattg 480 

cttggacaaa acatggccca tggcttgtgt ttatgtggat gatggcaaga ccaagtattt 540 

ggatcggtgt tatgggaggg ttegtaggag gaagctcaag gagagattgg ttcaaggctg 600 

tgtctccaat ggggttaggt ttcacaaggc aaaggtgtgc caagttcagc accaagagtt 660 

tgagrccaaa gttttgtgtg atgatggggt ggaattgaac gggagtttgg ttcttgatgc 720 

tagtggattc gcaagtaatt ttgtagcata tgacaaggtg agacaccatg gttttcagat 780 

tgctcatggt ctrttgecro aaggggatga tcaccctttt gatttgtgca aaatgggttt 840 
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;C^^t^< 



a e tcccaccg agggactctc atttggggaa tgaacctnac ttcaaagcta caattcaacc 900 

cttcctacct tcctctaatg caatoccaat tcattccaat t rear art tc ttgagcagac 960 

tt<rccttgt ( g agccgtccag rgrtgtctta cat ccaact c aaaaggagga t ggt tccaac 1020 

get aaggcac tt'acgcatra cagtcaagag ggttttggag gatgaaaagt gtttgattcc 1080 

aatgggagga cctcttccaa cgaiccccca agaagtgatg gctattggtg gcacttctgg 1140 

ggtagtgcac ccttccacag err acarggt ggct acaa ca atggctctgg ctcctgrtgt 1200 

ggcttttget at aact caat gccttggctc caccagaatg atcaggggga aacaacttca 1260 

tgataaagtt tggaatacca tctggccaat t gaga aca c c cr cat a aggg aattttactc 1320 

ttttggaatg gagact tret tcaaacttga cttgaatgga agcaggagtt tctttgaigc 1380 

attttttaac ttgaaacctt attactggca agggttcctc tcttcgaggt tgactttgaa 1440 

tgagcttctt tggttaagca tatcactctt tggacatgee tcaaatccat ccaggtttga 1500 

tattgtcaca aagtgtcctc ttccaatggc taaqatgctg cccaatatcg ctttggagta 1560 

cattggatqa tgatgatcat qatgatggaa ggttgtgaaa tgcgtgttta agtttttgtt 1&20 

tttcattggc tgaacttgea tcttgattt'g ttgagttgtc attgaattga tatatatgat 1680 

aat tragctg ctaataaaaa attaattcag ttccgtttaa aaaaaaaaaa a 1731 

<210> "l8 

<211> 493 

<212> . PRT • 

<213> Glycine max 

<220> • ' 

<221> UNSURE 
<222> (264) 

<400> 18 

Met Gly T,hr Ser Phe Met Leu Phe Ser Pro Pre Pro Leu Leu Lys Pre 
1 5 10 .15 

His Gin Val Pro Leu Thr Thr Pro Phe Pro Leu Pro Gin Thr His His 
20 .25 • 30 

Thr Ala Ser Arg Asn Lys Arg Val His Ser Thr Ser Lys Phe Gly Asn 
35 40 45 

Phe Leu Asp Phe Lys Pro Glu Asn Lys Pro Glu Ser Leu Asp Phe Asp • 
50 55 60 

Leu Pro Trp Cys His Pre Ser Asp Arg Asn Arc Phe Asp Val lie He 
65 70 75 80 

He Gly. Ala- Gly Fro Ala Gly Thr Arg Leu Ala Glu Gin Val Ser Leu 
85 90 95 

Tyr Gly Val Lys Val Cys Cys Val Asp Pro Asp Pro Leu Ser Val Trp 
100 105 110 

Pro Asn Asn Tyr Gly Val Trp Arg Asp Glu Phe Glu Ser Leu Gly Leu • 
115 120 125 

Glu Asp Cys Leu Asp Lys Thr Trp Pro Met Ala Cys Val Tyr Val Asp 
130 135 14 0 

Asp Gly Lys Thr Lys Tyr Leu Asp Arg Cys Tyr Gly Arg Val Gly Arg 
145 150 155 ~ 160 

Arg Lys Leu Lys Glu Arc Leu Val Gin Gly Cys Val Ser Asn Gly Val 
165 170 175 

Arg Phe His Lys Ala Lys Val Trp Gin Val Gin His Gin Glu Phe Glu 
180 185 190 
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Ser Lys Val Leu Cys Asp Asp Glv Va.T-Giu j^eu ,Lys Gly Ser Leu Val 
195 200 205 

Val Asp Ala Ser Gly Phe Ala Ser Asn Phe Val Ala Tyr Asp Lys Val 
210 215 220 

Arg His His Gly Phe Gin lie Ala His Gly Val Leu Ala Glu Gly Asp 
225 230 235 240 

Asp His Pro Phe Asp Leu Cys Lys Met Gly Leu Met Gly Arg Arg Asp 
245 250 255 

Ser His Leu Gly Asn Glu Pro Xaa Leu Lys Ala Arg He Gin Gly Phe 
260 265 . 270 

Leu Pro Ser' Sex Asn Ala Met Pro He His Ser Asn Leu He Phe Leu 
275 , 280 285 

Glu Glu Thr Ser Leu Val* Ser Arc Pro Val' Leu. Ser Tyr Met Glu Val 
290 295 300 

Lys Arg Arc Met Val Ala Arg Leu Arc His Leu Gly* He Arg Val Lys 
305 ~ • 310 315 320 

Arg Val Leu Glu Asp Glu Lys Cys Leu lie Pro Met Gly Gly Pro Leu 
32 5r 330- 335 

Pro Arg He Pro Gin Glu Val Met Ala lie Gly Gly Thr Ser Gly Val 
340 345 . 350 

Val His Pro Ser Thr Gly Tyr Met Val Ala, Arg Thr Met Ala Val Ala 
355 360 365 

Pro Val Val Ala Phe Ala lie Thr Gin Cys Leu Gly Ser Thr Arc Met 
370 375 380 

lie Arg Gly Lys Gin Leu His Asp Lys Val Trp Asn Ser Met Trp Pro 
385 390 395- 400 

»• ■ ■* 

lie Glu Asn Arc Leu Val Arg Glu Phe Tyr Ser Phe Gly Met Glu Thr 
405 410 415 

Leu Leu Lys Leu Asp Leu Asn Gly Ser Arc Ser Phe Phe Asp Ala Phe 
420 425 430 

Phe Asn Leu Lys Pro Tyr Tyr Trp Gin Gly Phe Leu Ser Ser Arg Leu 
4 35 440 44 5 

Thr Leu Asn Glu Leu Leu Trp Leu Ser lie Ser Leu Phe Gly His Ala 
450 455 460 

Ser Asn Pro Ser Arg Phe Asp lie Val Thr Lys Cys Pro Val Pro Met 
4 65 470 475 480 

Ala Lys Met Val Gly Asn lie Ala Leu Glu Tyr lie Gly 
485 4 90 

<210> 19 
<211> 853 
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<212> DNA 
<2I3> Triticum aestivum 



PCT/US99/08384 



<400> 29 

taccgaagtc 

actccatcce 

tccggagcct 

t gccgcacag 

acatcctgcc 

tacacactcc 

t gt ggccgac 

tcaagctcqa 

actattggca 

tctccatctt 

tgcctctttc 

ggt atgt atg 

tctcaaatci 

tagcacct cc 

cataaaaaaa 



ggtacggcgt 
cgacatccae 
cgaggacgac 
ggtggtgggc 
gcgcatattg 
aaatggt ggc 
ggtracgcg.g 
ccr ccaaggt 
cggcttcctc 
cgtgcacgct 
caacatcgtc 
tacctgcatc 
atctatgatt 
agagagatgt 
a a a 



nccat ct ccc 
gagcgcatgg 
cagcgcr gcg 
ctccccccca 
gcgaccgcgc 
a t cgcccggg 
ccgt acaggg 
acacgacgct 
t cctcgaccc 
tccaacacgt 
ggcaact tea 
tcaagatctt 
ggcaaagagg 
aacaattctt 



gagegaegt c 
ccgccctcct . 
rgat ccccat 
eggcegggat 
ccat cgtggc 
acgcgctcgc 
aactcttctg 
t cttcaacgc 
tget ectgea 
ccaagctgga 
t acaggacaa 
cat ggggt ct 
attt aaatag 
gctgt tgcta 



cctgcccgcg 
gatgcacctc 
gggegget eg 
ggtgcat ccc 
agact ccat t 
cgccgaggt g 
cttcggcatc 
att cttccac 
tgacctcctc 
gat catggee 
gga t aggt ga 
t t gattrtcg 
cagtggtagc 
cctt attaac 



cgcacgggga 
ggaat t cgea 
ctgcccgtgc 
t ccacagggt 
gtgcggt t t c 
t ggcccgagc 
gacgtcctgc 
cttgagccgc 
atgtt eggge 
aagggcaccc 
t gacttagag 
cat agct t t t 
aacagcagct 
atgtatattt 



60' 
.120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
84 0 
853 



<210> 


20 




<211* 


215 




<212> 


PRT 




<213> 


Triticum 


aestivum 


<220> 






<221> 


UNSURE 




<222> 


(7) 




<400> 


20 




Pro Ly 


s Ser Val 


Arg Arg Xaa 


1 




5 



10 



15 



Arg Thr Gly Asn Ser Met Asp Asp lie His Glu Arg Met Ala Ala Leu 
20 25 30 

Leu Met His Leu Gly He Arg lie Arg Ser Val Glu Asp Asp Glu Arg 
35 40 45 

Cys Val He Pro Met Gly Gly Ser Leu Pro Val Leu Pro His Arg Val 

.50 55 60 

Val Gly He Gly Gly Thr Ala Gly Met Val His Pro Ser Thr Gly Tyr 
65 . 7 0 .75 80 

Met Val Ala Arg He Leu Ala Thr Ala Pro He Val Ala Asp Ser lie 

85. 90 95 

Val Arg Phe Leu Asp Thr Ala Asn Gly Gly He Ala Gly Asp Ala Leu 
100 105 110 

Ala Ala Giu Val Trp Pro Glu Leu Trp Pro Thr Val Thr Arg Pro Tyr 
115 120 125 

Arg Glu Leu Phe Cys Phe Gly Met Asp Val Leu Leu Lys Leu Asp Leu 
130 135 140 



Gin Gly Thr A.rg Arg Phe Phe Asn Ala Phe Phe Asp Leu Giu Pro His 
145 150 155 160 
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Tyr Trp Kis Gly Phe Leu Ser Ser Arq Leu Leu ■ Leu His GTu Leu Leu 
165 ; ~,;1'70 ( 175 

Met Phe Gly Leu Ser Met Phe Val His Ala Ser Asn Thr Ser Lys Leu 
• 180 185 190 

Glu lie Met Ala Lys Gly Thr Leu Pro Leu' Ser Lys Met Val Gly Asn 
195 200 205 

Leu lie Gin Asp Lys Asp Arc 
210 215 

<210> 21 

<2il> 294 

<212> PRT 

<213> Arabidopsis thaliana 

<400> 21 

Ser Phe Ser Ser Ser Ser Thr Asp Phe Arg Leu Arg Leu Pro Lys Ser 
1 5 10 , • 15 

Leu Ser Gly Phe- Ser Pro Ser Leu Arg Phe Lys Arc Phe Ser Val Cys 
20 25 30 

Tyr Val Val Glu Glu Arg Arg Gin Asn Ser Pro lie Glu Asn Asp Glu 
35 4 0 4 5 

Arg Pro Glu Ser Thr Ser Ser Thr Asn Ala lie Asp Ala Glu Tyr Leu 
50 ' 55 60 

Ala Leu Arg Leu Ala Glu Lys Leu Glu Arg Lys Lys Ser Glu Arg Ser 
65 ' 70 75 80 

Thr Tyr Leu lie Ala Ala Met Leu Ser Ser Phe Gly lie Thr Ser Met 

8 5 90 95 

Ala Val Met Ala Val Tyr Tyr Arg Phe Ser Trp Gin Met Glu Gly Gly 
100 105 110 

Glu He Ser Me-t- Leu Glu Met Phe Gly Thr Phe Ala Leu Ser Val Gly 
115' ' 120 125 

Ala Ala Val Gly Met Glu Phe Trp Ala A.rg Trp Ala His Arg Ala Leu 
130 135 140 

Trp His Ala Ser Leu Trp Asn Met His Glu Ser His His Lys Pro Arg 
145 150 155 160 

Glu Gly Pro Phe Glu Leu Asn Asp Val Phe Ala He Val Asn Ala Gly 
165 170 175 

Pro Ala He Gly Leu Leu Ser Tyr Gly Phe Phe Asn Lys Gly Leu Val 
180 185 190 

Pro Gly Leu Cys Phe Gly Ala Gly Leu Gly He Thr Val Phe Gly He 
195 200 205 

Ala Tyr Met Phe Val His Asp Gly Leu Val His Lys Arg Phe Pro Val 
210 215 220 
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# W 

Gly Pro lie Ala jfl^Val Pro Tyr Leu Arg Lys Val Ala ATa Ala His 

225 • 230 " ,"v' ,235. 240 

Gin Leu His His Thr Asp Lys Phe Asn Gly Val Pro Tyr Gly Leu Phe 
24 5 250 255 

Leu Gly Pro' Lys- Glu Leu Glu Glu Val Gly Gly Asn Glu Glu Leu Asp 
260 265 270 . 

Lys Glu lie Ser Arc Arc lie Lys Ser .Tyr Lys Lys Ala Ser Gly Ser 
275 * 280 285 

Gly Ser Ser Ser Ser Ser 
2 90 

<210> 22 

<211> 316 

< 2 1 2 > PRT 

<2I3> Capsicum annuum 

<400> 22 

Thr Thr Gly Arg" " Tyr " Hi s Tyr Gin Leu Val Trp 'Cys Gin lie Ser Phe 

1 " ; 5 10 ' 15. . 

Ser Ser Thr Ser Arg Thr Ser Tyr Tyr Arg' His Ser Pro Phe Leu Gly 
20 25 30 

Pro Lys Pro Thr Pro Thr Thr Pro Ser Val Tyr Pro lie Thr Pro Phe 
35 4 0 4 5 

Ser Pro Asn Leu Gly Ser lie Leu Arg. Cys Arg Arg Arg Pro Ser Phe 
50 55 , 60 

Thr Val Cys Phe Val Leu Glu Asp Asp Lys Phe Lys Thr Gin Phe Glu 
65 70 75 80 

Ala Gly Glu Glu Asp lie Glu Met Lys lie Glu Glu Gin lie Ser Ala 

85 90 95 

Thr Arg Leu ApL'd'Glu Lys Leu Ala Arg Lys Lys Ser Glu Arg Phe Thr 
' 100 105 110 

Tyr Leu Val Ala Ala Val Met Ser Ser Phe Gly lie Thr Ser Met Ala 
115 120 125 

Val Met Ala Val Tyr Tyr Arg Phe Tyr Trp. Gin Met Glu Gly Gly Glu. 
130 135 140 



Val Pro Phe Ser 
145 

Ala Val Gly Met 



His Ala Ser Leu 
180 

Gly Pro Phe Glu 
195 



Glu Met Phe Gly 

150 

Glu Phe Trp Ala 
165 

Trp His Met His 



Leu Asn Asp Val 
200 



Thr Phe Ala Leu 
155 

Arg Trp Ala His 
17 0 

Glu Ser His His 
185 

Phe Ala lie lie 



Ser Val Gly Ala 
160 

Lys Ala Leu Trp 
17 5 

Lys Pro Arg Glu 
190 

Asn Ala Val Pro 
205 
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Ala lie Ala Leu Leu Asp Tyr Gly Phe Fhe His Lys Gly rou lie Pre 
210 215 220 

« 

Gl'y Leu Cys Phe Giv Ala Gly Leu Gly lie Thr Val Phe Gly Met Ala 
225 230 235 240 

♦ 

Tyr Met Phe Val His Asr> Gly Leu Val His Lys Arg Phe Pro Val Gly 
245 250 255 

Pro Val Ala Asn Val Pro Tyr Leu Arg Lys Val Ala Ala Ala His Ser 
260 265 270 

Leu His His Ser Glu Lys Phe Asn Gly Val Pro Tyr Gly Leu Phe Leu 
275 280 285 

Gly Pro Lys Glu Leu Glu Glu Val Gly Gly Leu Glu Glu Leu Glu Lys 
. 29-0 295 300 

Glu Val Asn Arg Arg Thr Arg Tyr lie Lys Gly Ser 
305 310 315 

<210> 23 

<211> 501 

<212> PRT 

<213> Arabidopsis thaliana 

<400> 23 

Met Asp Thr Leu Leu Lys Thr Pro Asn Lys Leu Asp Phe Phe lie Pro 

1 5 10 15 

Gin Phe His Gly Phe Glu Arg Leu Cys Ser Asn Asn Pro Tyr Pro Ser 
20 25 30 

Arg Val Arg Leu Gly Val Lys Lys Arg Ala lie Lys lie Val Ser Ser 
3 5 4 0 4 5 

Val Val Ser Gly Ser Ala Ala Leu Leu Asp Leu Val Pro Glu Thr Lys 
50 ' 55 60 

Lys Glu Asn Leu Asp Phe Glu Leu Pro Leu Tyr Asp Thr Ser Lys Ser 
65 70 75 80 

Gin VarVsl Asp Leu Ala lie Val Gly Gly Gly Pro Ala Gly Leu Ala 

65 90 95 

Val Ala Gin Gin Val Ser Glu Ala Gly Leu Ser Val Cys Ser lie Asp 
100 105 110 

Pro Ser Pro Lvs Leu He Trp Pro Asn Asn Tyr Gly Val Trp Val Asp 
115 120 125 

Glu Phe Glu Ala Met Asp Leu Leu Asp Cys Leu Asp Thr Thr Trp Ser 
130 135 140 

Gly Ala Val Val Tyr Val Asp Glu Gly Val Lys Lys Asp Leu Ser Arg 
145 150 " 155 160 

Pro Tvr Gly Arg Val Asn Arg Lys Gin Leu Lys Ser Lys Met Leu Gin 
165 370 175 



17 



WO 99/55887 PCT/US99/08384 



Lys Cys lie Thr Asn Gly Val Lys Phe His Gin Ser Lys Val Thr Asn 
180 " 185 190 

* 

Val Val His Giu Glu Ala Asn Ser Thr Val Val Cys Ser Asp Gly Val 
195 200 205 

Lys lie Gin Ala Ser Val Val Leu Asp Ala Thr Gly Phe Ser Arg Cys 
210 215 220 

Leu Val Gin Tyr Asp Lys Pro Tyr .Asn Pro Gly Tyr Gin Val Ala Tyr 
225 230 ^ 225- 240 

Gly lie lie Ala Glu Val Asp Gly His Pro Phe Asp Val Asp Lys Met 
245 250 255 

Val Phe Met Asp Tro Arc Asp Lys His Leu Asp Ser Tyr • Pro Glu Leu 
260 265 270 

Lys Glu Arg Asn Ser Lys lie Pro Thr Phe Leu Tyr Ala . Met Pro Phe 
275 280 285 

Ser Ser "Asn Arg lie Phe Leu Glu' Glu Thr Ser "Leu Val Ala Arg Pre 
290 295 • 300 

Gly Leu Arg Met Glu Asp He Gin Glu Arg Met Ala Ala Arg Leu Lys 
305 310 • • 315 220 

His Leu Gly He Asn Val Lys Arg He Glu Glu -Asp Glu Arg Cys Val 
325 330 335 

He Pro Met Gly Gly Pro Leu Pro Val Leu Pro Gin Arg Val- Val Gly 
340 345 350 

He Gly Gly Thr Ala Gly Met Val His Pro Ser Thr Gly Tyr Met Val 
355 360 365 

Ala Arg Thr Leu Ala Ala Ala Pro He Val Ala Asn Ala He Val Arg 
370 375 380 

Tyr Leu Gly Ser Pro Ser Ser Asn Ser Leu Arg Gly Asp Gin. Leu Ser 
385 , . . 390 395 400 

Ala Glu Val Trp Arg Asp Leu Trp Pro He Glu Arg Arg Arg Gin Arg 
405 410 415 

Glu Phe Phe Cys Phe Gly Met Asp He Leu Leu Lys Leu Asp Leu Asp 
420 425 430 

Ala Thr Arg Arg Phe Phe Asp Ala Phe Phe Asp Leu Gin Pro His Tyr 
435 440 445. 

Trp His Gly Phe Leu Ser Ser Arg Leu Phe Leu Pro Glu Leu Leu Val 
450 455 460 

Phe Gly Leu Ser Leu Phe Ser His Ala Ser Asn Thr Ser Arg Leu Glu 
465 470 475 480 

He Met Thr Lys Gly Thr Val Pro Leu Ala Lys Met He Asn Asn Leu 
485 490 495 
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Val Gin Asp Arq Asp 

500 / 

<210> 24 
<211> 526. 
<212> PRT 

<213> Lycopersicon esculentum 
<400> 24 

Met Glu Cys Val Gly Val Gin Asn Val Gly Ala Met Ala Val Leu Thr 
15 10 15 

Arg Pro Arc Leu Asn Arg Trp Ser Gly Gly Glu Leu Cys Gin Glu Lys 
2 0 2 5 3 0 

Ser lie Phe Leu Ala Tyr Glu Gin Tyr Glu Ser Lys Cys Asn Ser Ser 
35- 4 0 • 4 5 

Ser Gly Ser Asp Ser Cys Val Val Asp Lys Glu Asp Phe Ala Asp Glu 
50 .55 . • 60 

Glu Asp Tyr lie Lys Ala Gly Gly Ser Gin Leu Val Phe Val Gin Met 
65 " '70 75 . 80 

Gin Gin Lys Lys Asp Met Asp Gin Gin Ser Lys Leu Ser Asp Glu Leu 

85 90 95 

Arg Gin lie Ser Ala Gly Gin Thr Val Leu Asp Leu Val Val lie Gly 
100 * 105 * 110 

Cys Gly Pro Ala Gly Leu Ala Leu Ala Ala Glu Ser Ala Lys Leu Gly 
115 120 125 

Leu Asn Val Gly Leu Val Gly Pre Asp Leu Pro Phe Thr Asn Asn Tyr 
130 135 140 

Gly Val Trp Glu Asp Glu Phe Lys Asp Leu Gly Leu Gin Ala Cys lie 
145 150 155 160 

Glu His Val Trp, Arg Asp Thr lie Val Tyr Leu Asp Asp Asp Glu Pro 
■ ' 165 170 175 

lie Leu lie Glv Arg Ala Tyr Gly Arc Val Ser Arg His Phe Leu His 
180 185 190 

Glu Glu Leu Leu Lys Arg Cys Val Glu Ala Gly Val Leu Tyr Leu Asn 
195 " 200 205 

Ser Lys Val Asp Arg lie Val Glu Ala Thr Asn Gly Gin Ser Leu Val 
210 " 215 220 

Glu Cys Glu Gly Asp Val Val lie Pre Cys Arg Phe Val Thr Val Ala 
225 ~ 230 235 240 

Ser Gly Ala Ala Ser Gly Lys Phe Leu Gin Tyr Glu Leu Gly Ser Pro 
245 250 255 

Arg Val Ser Val Gin Thr Ala Tyr Gly Val Glu Val Glu Val Asp Asn 
260 265 270 
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Pl^Fser Leu Met Val Phe Met 'Asp -Tyr ArT^ 



Asn- Fro Phe Asp P^Ser Leu Met Val Phe Met 'Asp -Tyr AfF Asp Tyr 
275 - 260 ; V , 285 

Leu Arg His Asp Ala Gin Ser Leu Glu Ala Lys Tyr Pro Thr Phe Leu 
290 ' 29.5 300. 

Tyr Ala Met Pro Met Ser Pro Thr Arc Val Phe Phe Glu Glu Thr Cys 
305" 310 315 . 320 

Leu Ala Ser Lys Asp Ala Met .Pro Phe Asp Leu Leu Lys Lys Lys Leu 
325 330 335 

Met Leu Arg Leu Asn Thr . Leu Gly Val Arg lie Lys Glu lie Tyr Glu 
340 345 . 350 

Glu Glu Trp Ser Tyr lie Pro Val Gly Gly Ser Leu Pro Asn Thr Glu 
33-5- '• ~ 3 60 3 65 

Gin Lys Thr Leu Ala Phe Gly Ala Ala Ala Ser Met. Val His Pro Ala 
370 ' " ' 375 ' ' . 380 



Thr Gly Tyr Ser Val Val Arg Ser 
385 " 390 .. . . 

Ser Val Leu Ala Asn lie Leu Arg 
405 

Thr Ser Ser Ser lie Pro Ser lie 
420 

Trp Pro Gin Glu Arg Lys Arg Gin 
435 440 

Ala Leu lie Leu Gin Leu Asp lie 
450 455 

Ala Phe Phe Arg Val Pro Lys Trp 
465 470 

Ser Leu Ser Ser* Ala Asp Leu. Met 
' * 4 85 

lie Ala Pro Asn Asp Met Arg Lys 
500 

Asp Pro Thr Gly Ala Thr Leu lie 
515 520 



Leu Ser Glu Ala Pro Lys Cys Ala 
395 " 400 

Gin His Tyr Ser' Lys Asn Met Leu 
410 ^ 415 

Ser Thr Gin' Ala Trp Asn Thr Leu 
425 430 

Arg Ser Phe Phe Leu Phe Gly Leu 
445 

Glu Gly lie Arg Ser Phe Phe Arg 
460 

Met Trp Gin Gly Phe Leu Gly Ser 
475 " 480 

Leu Phe Ala Phe Tyr Met Phe lie 
490 " 495 

Gly Leu lie Arg His Leu Leu Ser 
505 510 

Arg Thr Tyr Leu Thr Phe 
525 
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